# Base¶

## Module¶

class sonnet.Module(name: Optional[str] = None)[source]

Base class for Sonnet modules.

A Sonnet module is a lightweight container for variables and other modules. Modules typically define one or more “forward” methods (e.g. __call__) which apply operations combining user input and module parameters. For example:

>>> class MultiplyModule(snt.Module):
...   def __call__(self, x):
...     if not hasattr(self, 'w'):
...       self.w = tf.Variable(2., name='w')
...     return x * self.w

>>> mod = MultiplyModule()
>>> mod(1.)
<tf.Tensor: ... numpy=2.0>


Sonnet modules are a layer on top of tf.Module, implementing automatic name scoping as described in the original RFC [1].

__init__(name: Optional[str] = None)[source]

Initializes the current module with the given name.

Subclasses should call this constructor before creating other modules or variables such that those modules are named correctly.

Parameters

name – An optional string name for the class. Must be a valid Python identifier. If name is not provided then the class name for the current instance is converted to lower_snake_case and used instead.

property variables

Sequence of tf.Variables owned by this module and it’s submodules.

See tf.Module.variables for implementation details.

NOTE: Most Sonnet modules create variables lazily (e.g. the first time they are called). As such just after construction there are typically no variables. To mitigate a common error (calling .variables or .trainable_variables before any variables are created) these properties will raise an exception if their result is empty. See allow_empty_variables() if you want to suppress this error.

Returns

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

property trainable_variables

Sequence of tf.Variables owned by this module and it’s submodules.

See tf.Module.trainable_variables for implementation details.

NOTE: Most Sonnet modules create variables lazily (e.g. the first time they are called). As such just after construction there are typically no variables. To mitigate a common error (calling .variables or .trainable_variables before any variables are created) these properties will raise an exception if their result is empty. See allow_empty_variables() if you want to suppress this error.

Returns

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

## once¶

sonnet.once(f)[source]

Decorator which ensures a wrapped method is only ever run once.

>>> @snt.once
... def f():
...   print('Hello, world!')
>>> f()
Hello, world!
>>> f()
>>> f()


If f is a method then it will be evaluated once per instance:

>>> class MyObject(object):
...   @snt.once
...   def f(self):
...     print('Hello, world!')

>>> o = MyObject()
>>> o.f()
Hello, world!
>>> o.f()

>>> o2 = MyObject()
>>> o2.f()
Hello, world!
>>> o.f()
>>> o2.f()


If an error is raised during execution of f it will be raised to the user. Next time the method is run, it will be treated as not having run before.

Parameters

f – A function to wrap which should only be called once.

Returns

Wrapped version of f which will only evaluate f the first time it is called.

## no_name_scope¶

sonnet.no_name_scope(method: T) → T[source]

Decorator to wrap a method, preventing automatic name scope wrapping.

By default, any method on a module is considered as a forwards function, and so any variables / modules created by the method will be scoped as belonging to the module. In some cases this is undesirable, for example when implementing .clone() / .transpose(), as in those cases we want the new module to have the scope of wherever the .transpose() call is made. To allow this, decorate any methods with no_name_scope.

Parameters

method – the method to wrap.

Returns

The method, with a flag indicating no name scope wrapping should occur.

## Deferred¶

class sonnet.Deferred(constructor, call_methods=('__call__', ), name=None)[source]

Defers the construction of another module until the first call.

Deferred can be used to declare modules that depend on computed properties of other modules before those modules are defined. This allows users to separate the declaration and use of modules. For example at the start of your program you can declare two modules which are coupled:

>>> encoder = snt.Linear(64)
>>> decoder = snt.Deferred(lambda: snt.Linear(encoder.input_size))


Later you can use these naturally (note: that using decoder first would cause an error since encoder.input_size is only defined after encoder has been called):

>>> x = tf.ones([8, 32])
>>> y = encoder(x)
>>> z = decoder(y)  # Constructs the Linear encoder by calling the lambda.


The result will satisfy the following conditions:

>>> assert x.shape == z.shape
>>> assert y.shape == [8, 64]
>>> assert decoder.input_size == encoder.output_size
>>> assert decoder.output_size == encoder.input_size

__init__(constructor, call_methods=('__call__', ), name=None)[source]

Initializes the Deferred module.

Parameters
• constructor – A no argument callable which constructs the module to defer to. The first time one of the call_methods are called the constructor will be run and then the constructed module will be called with the same method and arguments as the deferred module.

• call_methods – Methods which should trigger construction of the target module. The default value configures this module to construct the first time __call__ is run. If you want to add methods other than call you should explicitly pass them (optionally), for example call_methods=(“__call__”, “encode”, “decode”).

• name – Name for the deferred module.

property target

Returns the target module.

If the constructor has not already run this will trigger construction. Subsequent calls to target will return the same instance.

Returns

A Module instance as created by self.constructor() .

__call__(*args, **kwargs)[source]

Call self as a function.

__setattr__(name, value)[source]

Support self.foo = trackable syntax.

__delattr__(name)[source]

Implement delattr(self, name).

# Linear modules¶

## Linear¶

class sonnet.Linear(output_size: int, with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, name: Optional[str] = None)[source]

Linear module, optionally including bias.

__init__(output_size: int, with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, name: Optional[str] = None)[source]

Constructs a Linear module.

Parameters
• output_size – Output dimensionality.

• with_bias – Whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• name – Name of the module.

__call__(inputs: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]

Call self as a function.

## Bias¶

class sonnet.Bias(output_size: Optional[int] = None, bias_dims: Optional[Sequence[int]] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, name: Optional[str] = None)[source]

Bias module.

Example Usage:

>>> N, H, W, C = 1, 2, 3, 4
>>> x = tf.random.normal([N, H, W, C])

>>> scalar_bias = snt.Bias(bias_dims=[])
>>> scalar_bias_output = scalar_bias(x)
>>> assert scalar_bias.b.shape == []


Create a bias over all non-minibatch dimensions:

>>> all_bias = snt.Bias()
>>> all_bias_output = all_bias(x)
>>> assert all_bias.b.shape == [H, W, C]


Create a bias over the last non-minibatch dimension:

>>> last_bias = snt.Bias(bias_dims=[-1])
>>> last_bias_output = last_bias(x)
>>> assert last_bias.b.shape == [C]


Create a bias over the first non-minibatch dimension:

>>> first_bias = snt.Bias(bias_dims=[1])
>>> first_bias_output = first_bias(x)
>>> assert first_bias.b.shape == [H, 1, 1]


Subtract and later add the same learned bias:

>>> bias = snt.Bias()
>>> h1 = bias(x, multiplier=-1)
>>> h2 = bias(x)
>>> h3 = bias(x, multiplier=-1)
>>> reconstructed_x = bias(h3)
>>> assert tf.reduce_all(tf.equal(x, reconstructed_x))

__init__(output_size: Optional[int] = None, bias_dims: Optional[Sequence[int]] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, name: Optional[str] = None)[source]

Constructs a Bias module that supports broadcasting.

Parameters
• output_size – Output size (output shape without batch dimension). If output_size is left as None, the size will be directly inferred by the input.

• bias_dims – Sequence of which dimensions to retain from the input shape when constructing the bias. The remaining dimensions will be broadcast over (given size of 1), and leading dimensions will be removed completely. See class doc for examples.

• b_init – Optional initializer for the bias. Default to zeros.

• name – Name of the module.

__call__(inputs: tensorflow.python.framework.ops.Tensor, multiplier: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = None)[source]

Adds bias to inputs and optionally multiplies by multiplier.

Parameters
• inputs – A Tensor of size [batch_size, input_size1, …].

• multiplier – A scalar or Tensor which the bias term is multiplied by before adding it to inputs. Anything which works in the expression bias * multiplier is acceptable here. This may be useful if you want to add a bias in one place and subtract the same bias in another place via multiplier=-1.

Returns

A Tensor of size [batch_size, input_size1, …].

# Convolutional modules¶

## Conv1D¶

class sonnet.Conv1D(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NWC', name: Optional[str] = None)[source]

Conv1D module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NWC', name: Optional[str] = None)[source]

Constructs a Conv1D module.

Parameters
• output_channels – The number of output channels.

• kernel_shape – Sequence of length 1, or an integer. kernel_shape will be expanded to define a kernel size in all dimensions.

• stride – Sequence of strides of length 1, or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of dilation rates of length 1, or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard convolution, rate > 1 corresponds to dilated convolution.

• padding – Padding to apply to the input. This can be either SAME, VALID or a callable or sequence of callables of size 1. Any callables must take a single integer argument equal to the effective kernel size and return a list of two integers representing the padding before and after. See snt.pad.* for more details and example functions.

• with_bias – Whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1/sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

## Conv2D¶

class sonnet.Conv2D(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', name: Optional[str] = None)[source]

Conv2D module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', name: Optional[str] = None)[source]

Constructs a Conv2D module.

Parameters
• output_channels – The number of output channels.

• kernel_shape – Sequence of kernel sizes (of length 2), or an integer. kernel_shape will be expanded to define a kernel size in all dimensions.

• stride – Sequence of strides (of length 2), or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of dilation rates (of length 2), or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard convolution, rate > 1 corresponds to dilated convolution.

• padding – Padding to apply to the input. This can either SAME, VALID or a callable or sequence of callables of size 2. Any callables must take a single integer argument equal to the effective kernel size and return a list of two integers representing the padding before and after. See snt.pad.* for more details and example functions.

• with_bias – Whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

## Conv3D¶

class sonnet.Conv3D(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NDHWC', name: Optional[str] = None)[source]

Conv3D module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: Union[str, Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]] = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NDHWC', name: Optional[str] = None)[source]

Constructs a Conv3D module.

Parameters
• output_channels – The number of output channels.

• kernel_shape – Sequence of kernel sizes (of length 3), or an integer. kernel_shape will be expanded to define a kernel size in all dimensions.

• stride – Sequence of strides (of length 3), or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of dilation rates (of length 3), or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard convolution, rate > 1 corresponds to dilated convolution.

• padding – Padding to apply to the input. This can either SAME, VALID or a callable or sequence of callables up to size N. Any callables must take a single integer argument equal to the effective kernel size and return a list of two integers representing the padding before and after. See snt.pad.* for more details and example functions.

• with_bias – Whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

## Conv1DTranspose¶

class sonnet.Conv1DTranspose(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NWC', name: Optional[str] = None)[source]

A 1D transpose convolutional module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NWC', name: Optional[str] = None)[source]

Constructs a Conv1DTranspose module.

Parameters
• output_channels – Number of output channels.

• kernel_shape – Sequence of integers (of length 1), or an integer representing kernel shape. kernel_shape will be expanded to define a kernel size in all dimensions.

• output_shape – Output shape of the spatial dimensions of a transpose convolution. Can be either an integer or an iterable of integers or Dimensions, or a TensorShape (of length 1). If a None value is given, a default shape is automatically calculated.

• stride – Sequence of integers (of length 1), or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of integers (of length 1), or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard 1D convolution, rate > 1 corresponds to dilated convolution.

• with_bias – Boolean, whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

## Conv2DTranspose¶

class sonnet.Conv2DTranspose(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', name: Optional[str] = None)[source]

A 2D transpose convolutional module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', name: Optional[str] = None)[source]

Constructs a Conv2DTranspose module.

Parameters
• output_channels – An integer, The number of output channels.

• kernel_shape – Sequence of integers (of length 2), or an integer representing kernel shape. kernel_shape will be expanded to define a kernel size in all dimensions.

• output_shape – Output shape of the spatial dimensions of a transpose convolution. Can be either an integer or an iterable of integers or Dimensions, or a TensorShape (of length 2). If a None value is given, a default shape is automatically calculated.

• stride – Sequence of integers (of length 2), or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of integers (of length 2), or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard 2D convolution, rate > 1 corresponds to dilated convolution.

• with_bias – Boolean, whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

## Conv3DTranspose¶

class sonnet.Conv3DTranspose(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NDHWC', name: Optional[str] = None)[source]

A 3D transpose convolutional module.

__init__(output_channels: int, kernel_shape: Union[int, Sequence[int]], output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape, None] = None, stride: Union[int, Sequence[int]] = 1, rate: Union[int, Sequence[int]] = 1, padding: str = 'SAME', with_bias: bool = True, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NDHWC', name: Optional[str] = None)[source]

Constructs a Conv3DTranspose module.

Parameters
• output_channels – An integer, The number of output channels.

• kernel_shape – Sequence of integers (of length 3), or an integer representing kernel shape. kernel_shape will be expanded to define a kernel size in all dimensions.

• output_shape – Output shape of the spatial dimensions of a transpose convolution. Can be either an integer or an iterable of integers or Dimensions, or a TensorShape (of length 3). If a None value is given, a default shape is automatically calculated.

• stride – Sequence of integers (of length 3), or an integer. stride will be expanded to define stride in all dimensions.

• rate – Sequence of integers (of length 3), or integer that is used to define dilation rate in all dimensions. 1 corresponds to standard 3D convolution, rate > 1 corresponds to dilated convolution.

• with_bias – Boolean, whether to include bias parameters. Default True.

• w_init – Optional initializer for the weights. By default the weights are initialized truncated random normal values with a standard deviation of 1 / sqrt(input_feature_size), which is commonly used when the inputs are zero centered (see https://arxiv.org/abs/1502.03167v3).

• b_init – Optional initializer for the bias. By default the bias is initialized to zero.

• data_format – The data format of the input.

• name – Name of the module.

# Normalization modules¶

## LayerNorm¶

class sonnet.LayerNorm(axis: Union[int, slice, Sequence[int]], create_scale: bool, create_offset: bool, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Normalizes inputs along the given axes.

This is a generic implementation of normalization along specific axes of the input. InstanceNorm is a subclass of this module, it normalizes over the spatial dimensions.

It transforms the input x into:

$\d{outputs} = \d{scale} \dfrac{x - \mu}{\sigma + \epsilon} + \d{offset}$

Where $$\mu$$ and $$\sigma$$ are respectively the mean and standard deviation of x.

There are many different variations for how users want to manage scale and offset if they require them at all. These are:

• No scale/offset in which case create_* should be set to False and scale/offset aren’t passed when the module is called.

• Trainable scale/offset in which case create_* should be set to True and again scale/offset aren’t passed when the module is called. In this case this module creates and owns the scale/offset variables.

• Externally generated scale/offset, such as for conditional normalization, in which case create_* should be set to False and then the values fed in at call time.

scale

If create_scale=True, a trainable tf.Variable holding the current scale.

offset

If create_offset=True, a trainable tf.Variable holding the current offset.

__init__(axis: Union[int, slice, Sequence[int]], create_scale: bool, create_offset: bool, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs an LayerNorm module.

Parameters
• axis

An int, slice or sequence of ints representing the axes which should be normalized across. Typical usages are: 1 or -1

for normalization over just the channels and slice(1, None), slice(2, None) for normalization over the spatial and channel dimensions whilst avoiding the batch and/or time dimensions.

• create_scalebool representing whether to create a trainable scale per channel applied after the normalization.

• create_offsetbool representing whether to create a trainable offset per channel applied after normalization and scaling.

• eps – Small epsilon to avoid division by zero variance. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

__call__(inputs: tensorflow.python.framework.ops.Tensor, scale: Optional[tensorflow.python.framework.ops.Tensor] = None, offset: Optional[tensorflow.python.framework.ops.Tensor] = None) → tensorflow.python.framework.ops.Tensor[source]

Returns normalized inputs.

Parameters
• inputs – An n-D tensor of the data_format specified in the constructor on which the transformation is performed.

• scale – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the scale applied to the normalized inputs. This cannot be passed in if the module was constructed with create_scale=True.

• offset – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the offset applied to the normalized inputs. This cannot be passed in if the module was constructed with create_offset=True.

Returns

An n-d tensor of the same shape as inputs that has been normalized.

## InstanceNorm¶

class sonnet.InstanceNorm(create_scale: bool, create_offset: bool, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Normalizes inputs along the spatial dimensions.

See LayerNorm for more details.

scale

If create_scale=True, a trainable tf.Variable holding the current scale.

offset

If create_offset=True, a trainable tf.Variable holding the current offset.

__init__(create_scale: bool, create_offset: bool, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs an InstanceNorm module.

This method creates a module which normalizes over the spatial dimensions.

Parameters
• create_scalebool representing whether to create a trainable scale per channel applied after the normalization.

• create_offsetbool representing whether to create a trainable offset per channel applied after normalization and scaling.

• eps – Small epsilon to avoid division by zero variance. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

## BaseBatchNorm¶

class sonnet.BaseBatchNorm(create_scale: bool, create_offset: bool, moving_mean: sonnet.src.metrics.Metric, moving_variance: sonnet.src.metrics.Metric, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Batch normalization module.

This implements normalization across the batch and spatial dimensions. It maintains moving averages of the mean and variance which can be used to normalize at test time. The constructor is generic and requires the user to pass in objects to compute these.

At training time we use the batch statistics for that batch and these are then used to update the moving averages.

At test time we can either use the moving averages of the batch statistics (test_local_stats=False) or we can use the local statistics (test_local_stats=True).

It transforms the input x into:

$\d{outputs} = \d{scale} \dfrac{x - \mu}{\sigma + \epsilon} + \d{offset}$

Where $$\mu$$ and $$\sigma$$ are respectively the mean and standard deviation of x. Note that this module automatically uses the fused batch norm op if the data format is NHWC.

There are many different variations for how users want to manage scale and offset if they require them at all. These are:

• No scale/offset in which case create_* should be set to False and scale/offset aren’t passed when the module is called.

• Trainable scale/offset in which case create_* should be set to True and again scale/offset aren’t passed when the module is called. In this case this module creates and owns the scale/offset variables.

• Externally generated scale/offset, such as for conditional normalization, in which case create_* should be set to False and then the values fed in at call time.

scale

If create_scale, a trainable tf.Variable holding the current scale after the module is connected for the first time.

offset

If create_offset, a trainable tf.Variable holding the current offset after the module is connected for the first time.

__init__(create_scale: bool, create_offset: bool, moving_mean: sonnet.src.metrics.Metric, moving_variance: sonnet.src.metrics.Metric, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs a BaseBatchNorm module.

Parameters
• create_scale – whether to create a trainable scale per channel applied after the normalization.

• create_offset – whether to create a trainable offset per channel applied after normalization and scaling.

• moving_mean – A metric which tracks the moving average of the mean which can be used to normalize at test time.

• moving_variance – A metric which tracks the moving average of the variance which can be used to normalize at test time.

• eps – Small epsilon to avoid division by zero variance. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

__call__(inputs: tensorflow.python.framework.ops.Tensor, is_training: Union[bool, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], test_local_stats: Union[bool, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = False, scale: Optional[tensorflow.python.framework.ops.Tensor] = None, offset: Optional[tensorflow.python.framework.ops.Tensor] = None)[source]

Returns normalized inputs.

Parameters
• inputs – An n-D tensor of the data_format specified above on which the transformation is performed.

• is_training – Whether the module should be connected in training mode, meaning the moving averages are updated.

• test_local_stats – Whether local batch statistics should be used when is_training=False. If not, moving averages are used. By default False.

• scale – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the scale applied to the normalized inputs. This cannot be passed in if the module was constructed with create_scale=True.

• offset – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the offset applied to the normalized inputs. This cannot be passed in if the module was constructed with create_offset=True.

Returns

An n-d tensor of the same shape as inputs that has been normalized.

## BatchNorm¶

class sonnet.BatchNorm(create_scale: bool, create_offset: bool, decay_rate: float = 0.999, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Batch normalization with exponential moving average for test statistics.

See BaseBatchNorm for details.

scale

If create_scale=True, a trainable tf.Variable holding the current scale after the module is connected for the first time.

offset

If create_offset, a trainable tf.Variable holding the current offset after the module is connected for the first time.

__init__(create_scale: bool, create_offset: bool, decay_rate: float = 0.999, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs a BatchNorm module.

Parameters
• create_scale – whether to create a trainable scale per channel applied after the normalization.

• create_offset – whether to create a trainable offset per channel applied after normalization and scaling.

• decay_rate – Decay rate of the exponential moving averages of the mean and variance.

• eps – Small epsilon to avoid division by zero variance. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

## CrossReplicaBatchNorm¶

class sonnet.distribute.CrossReplicaBatchNorm(create_scale: bool, create_offset: bool, moving_mean: sonnet.src.metrics.Metric, moving_variance: sonnet.src.metrics.Metric, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Cross-replica Batch Normalization.

At every step the full batch is used to calculate the batch statistics even within a distributed setting (note only with snt.(Tpu)Replicator).

See BaseBatchNorm for details.

scale

If create_scale=True, a trainable tf.Variable holding the current scale after the module is connected for the first time.

offset

If create_offset, a trainable tf.Variable holding the current offset after the module is connected for the first time.

__init__(create_scale: bool, create_offset: bool, moving_mean: sonnet.src.metrics.Metric, moving_variance: sonnet.src.metrics.Metric, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs a CrossReplicaBatchNorm module.

Parameters
• create_scale – whether to create a trainable scale per channel applied after the normalization.

• create_offset – whether to create a trainable offset per channel applied after normalization and scaling.

• moving_mean – An object which keeps track of the moving average of the mean which can be used to normalize at test time. This object must have an update method which takes a value and updates the internal state and a value property which returns the current mean.

• moving_variance – An object which keeps track of the moving average of the variance which can be used to normalize at test time. This object must have an update method which takes a value and updates the internal state and a value property which returns the current variance.

• eps – Small epsilon to avoid division by zero variance. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

## GroupNorm¶

class sonnet.GroupNorm(groups: int, axis: Union[int, slice, Sequence[int]] = slice(1, None, None), create_scale: bool = True, create_offset: bool = True, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Group normalization module.

This applies group normalization to the inputs. This involves splitting the channels into groups before calculating the mean and variance. The default behaviour is to compute the mean and variance over the spatial dimensions and the grouped channels. The mean and variance will never be computed over the created groups axis.

It transforms the input x into:

$\d{outputs} = \d{scale} \dfrac{x - \mu}{\sigma + \epsilon} + \d{offset}$

Where $$\mu$$ and $$\sigma$$ are respectively the mean and standard deviation of x.

There are many different variations for how users want to manage scale and offset if they require them at all. These are:

• No scale/offset in which case create_* should be set to False and scale/offset aren’t passed when the module is called.

• Trainable scale/offset in which case create_* should be set to True and again scale/offset aren’t passed when the module is called. In this case this module creates and owns the scale/offset variables.

• Externally generated scale/offset, such as for conditional normalization, in which case create_* should be set to False and then the values fed in at call time.

scale

If create_scale=True, a trainable tf.Variable holding the current scale.

offset

If create_offset=True, a trainable tf.Variable holding the current offset.

__init__(groups: int, axis: Union[int, slice, Sequence[int]] = slice(1, None, None), create_scale: bool = True, create_offset: bool = True, eps: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-05, scale_init: Optional[sonnet.src.initializers.Initializer] = None, offset_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'channels_last', name: Optional[str] = None)[source]

Constructs a GroupNorm module.

Parameters
• groups – number of groups to divide the channels by. The number of channels must be divisible by this.

• axisint, slice or sequence of ints representing the axes which should be normalized across. By default this is all but the first dimension. For time series data use slice(2, None) to average over the none Batch and Time data.

• create_scale – whether to create a trainable scale per channel applied after the normalization.

• create_offset – whether to create a trainable offset per channel applied after normalization and scaling.

• eps – Small epsilon to add to the variance to avoid division by zero. Defaults to 1e-5.

• scale_init – Optional initializer for the scale variable. Can only be set if create_scale=True. By default scale is initialized to 1.

• offset_init – Optional initializer for the offset variable. Can only be set if create_offset=True. By default offset is initialized to 0.

• data_format – The data format of the input. Can be either channels_first, channels_last, N...C or NC.... By default it is channels_last.

• name – Name of the module.

__call__(inputs: tensorflow.python.framework.ops.Tensor, scale: Optional[tensorflow.python.framework.ops.Tensor] = None, offset: Optional[tensorflow.python.framework.ops.Tensor] = None)[source]

Returns normalized inputs.

Parameters
• inputs – An n-D tensor of the data_format specified in the constructor on which the transformation is performed.

• scale – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the scale applied to the normalized inputs. This cannot be passed in if the module was constructed with create_scale=True.

• offset – A tensor up to n-D. The shape of this tensor must be broadcastable to the shape of inputs. This is the offset applied to the normalized inputs. This cannot be passed in if the module was constructed with create_offset=True.

Returns

An n-d tensor of the same shape as inputs that has been normalized.

# Recurrent modules¶

## RNNCore¶

class sonnet.RNNCore(name: Optional[str] = None)[source]

Base class for Recurrent Neural Network cores.

This class defines the basic functionality that every core should implement: initial_state(), used to construct an example of the core state; and __call__() which applies the core parameterized by a previous state to an input.

Cores are typically used with dynamic_unroll() and static_unroll() to iteratively construct an output sequence from the given input sequence.

abstract __call__(inputs: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], prev_state)[source]

Performs one step of an RNN.

Parameters
• inputs – An arbitrarily nested structure of shape [B, …] where B is the batch size.

• prev_state – Previous core state.

Returns

• outputs - An arbitrarily nested structure of shape [B, …]. Dimensions following the batch size could be different from that of inputs.

• next_state - Next core state, must be of the same shape as the previous one.

Return type

A tuple with two elements

abstract initial_state(batch_size: Union[int, numpy.integer, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], **kwargs)[source]

Constructs an initial state for this core.

Parameters
• batch_size – An int or an integral scalar tensor representing batch size.

• **kwargs – Optional keyword arguments.

Returns

Arbitrarily nested initial state for this core.

## UnrolledRNN¶

class sonnet.UnrolledRNN(name: Optional[str] = None)[source]

Base class for unrolled Recurrent Neural Networks.

This class is a generalization of RNNCore which operates on an input sequence as opposed to a single time step.

abstract __call__(input_sequence: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], initial_state: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]])[source]

Apply this RNN to the input sequence.

Parameters
• input_sequence – An arbitrarily nested structure of shape [T, B, …] where T is the number of time steps and B is the batch size.

• initial_state – Initial RNN state.

A tuple with two elements: * output_sequence - An arbitrarily nested

structure of tensors of shape [T, B, …]. Dimensions following the batch size could be different from that of the input_sequence. * final_state - Final RNN state, must be of the same shape as the initial one.

abstract initial_state(batch_size: Union[int, numpy.integer, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], **kwargs)[source]

Construct an initial state for this RNN.

Parameters
• batch_size – An int or an integral scalar tensor representing batch size.

• **kwargs – Optional keyword arguments.

Returns

Arbitrarily nested initial state for this RNN.

## TrainableState¶

class sonnet.TrainableState(initial_values: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], mask: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]] = None, name: Optional[str] = None)[source]

Trainable state for an RNNCore.

The state can be constructed manually from a nest of initial values:

>>> state = snt.TrainableState((tf.zeros([16]), tf.zeros([16])))


or automatically for a given RNNCore:

>>> core = snt.LSTM(hidden_size=16)
>>> state = snt.TrainableState.for_core(core)

classmethod for_core(core: sonnet.src.recurrent.RNNCore, mask: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest], None] = None, name: Optional[str] = None)[source]

Constructs a trainable state for a given RNNCore.

Parameters
• core – An RNNCore to construct the state for.

• mask – Optional boolean mask of the same structure as the initial state of core specifying which components should be trainable. If not given, the whole state is considered trainable.

• name – Name of the module.

Returns

A TrainableState.

__init__(initial_values: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], mask: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]] = None, name: Optional[str] = None)[source]

Constructs a trainable state from initial values.

Parameters
• initial_values – Arbitrarily nested initial values for the state.

• mask – Optional boolean mask of the same structure as initial_values specifying which components should be trainable. If not given, the whole state is considered trainable.

• name – Name of the module.

__call__(batch_size: int) → Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]][source]

Returns a trainable state for the given batch size.

## dynamic_unroll¶

sonnet.dynamic_unroll(core, input_sequence, initial_state, sequence_length=None, parallel_iterations=1, swap_memory=False)[source]

Performs a dynamic unroll of an RNN.

>>> core = snt.LSTM(hidden_size=16)
>>> batch_size = 3
>>> input_sequence = tf.random.uniform([1, batch_size, 2])
>>> output_sequence, final_state = snt.dynamic_unroll(
...     core,
...     input_sequence,
...     core.initial_state(batch_size))


An unroll corresponds to calling the core on each element of the input sequence in a loop, carrying the state through:

state = initial_state
for t in range(len(input_sequence)):
outputs, state = core(input_sequence[t], state)


A dynamic unroll preserves the loop structure when executed within tf.function. See static_unroll() for an unroll function which replaces a loop with its body repeated multiple times.

Parameters
• core – An RNNCore to unroll.

• input_sequence – An arbitrarily nested structure of tensors of shape [T, B, ...] where T is the number of time steps, and B is the batch size.

• initial_state – initial state of the given core.

• sequence_length – An optional tensor of shape [B] specifying the lengths of sequences within the (padded) batch.

• parallel_iterations – An optional int specifying the number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer. Defaults to 1.

• swap_memory – Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty. Defaults to False.

Returns

• output_sequence - An arbitrarily nested structure of tensors of shape [T, B, ...]. Dimensions following the batch size could be different from that of the input_sequence.

• final_state - Core state at time step T.

Return type

A tuple with two elements

Raises

ValueError – If input_sequence is empty.

## static_unroll¶

sonnet.static_unroll(core: sonnet.src.recurrent.RNNCore, input_sequence: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], initial_state: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], sequence_length: Union[int, numpy.integer, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, None] = None) → Tuple[Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]]][source]

Performs a static unroll of an RNN.

>>> core = snt.LSTM(hidden_size=16)
>>> batch_size = 3
>>> input_sequence = tf.random.uniform([1, batch_size, 2])
>>> output_sequence, final_state = snt.static_unroll(
...     core,
...     input_sequence,
...     core.initial_state(batch_size))


An unroll corresponds to calling the core on each element of the input sequence in a loop, carrying the state through:

state = initial_state
for t in range(len(input_sequence)):
outputs, state = core(input_sequence[t], state)


A static unroll replaces a loop with its body repeated multiple times when executed inside tf.function:

state = initial_state
outputs0, state = core(input_sequence[0], state)
outputs1, state = core(input_sequence[1], state)
outputs2, state = core(input_sequence[2], state)
...


See dynamic_unroll() for a loop-preserving unroll function.

Parameters
• core – An RNNCore to unroll.

• input_sequence – An arbitrarily nested structure of tensors of shape [T, B, ...] where T is the number of time steps, and B is the batch size.

• initial_state – An initial state of the given core.

• sequence_length – An optional tensor of shape [B] specifying the lengths of sequences within the (padded) batch.

Returns

• output_sequence - An arbitrarily nested structure of tensors of shape [T, B, ...]. Dimensions following the batch size could be different from that of the input_sequence.

• final_state - Core state at time step T.

Return type

A tuple with two elements

Raises

ValueError – If input_sequence is empty or its leading dimension is not known statically.

## VanillaRNN¶

class sonnet.VanillaRNN(hidden_size: int, activation: Callable[[Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable]], Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable]] = <function tanh>, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Basic fully-connected RNN core.

Given $$x_t$$ and the previous hidden state $$h_{t-1}$$ the core computes

$h_t = w_i x_t + w_h h_{t-1} + b$
input_to_hidden

Input-to-hidden weights $$w_i$$, a tensor of shape [hidden_size, hidden_size].

hidden_to_hidden

Hidden-to-hidden weights $$w_i$$, a tensor of shape [input_size, hidden_size].

b

bias, a tensor or shape [hidden_size].

__init__(hidden_size: int, activation: Callable[[Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable]], Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable]] = <function tanh>, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs a vanilla RNN core.

Parameters
• hidden_size – Hidden layer size.

• activation – Activation function to use. Defaults to tf.tanh.

• w_i_init

Optional initializer for the input-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(input_size).

• w_h_init

Optional initializer for the hidden-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(hidden_size).

• b_init – Optional initializer for the bias. Defaults to Zeros.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

__call__(inputs: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]], prev_state: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, Iterable[TensorNest], Mapping[str, TensorNest]]) → Tuple[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.ops.Tensor][source]

See base class.

initial_state(batch_size: int) → tensorflow.python.framework.ops.Tensor[source]

See base class.

## DeepRNN¶

class sonnet.DeepRNN(layers, name: Optional[str] = None)[source]

Linear chain of RNNCores or callables.

The core takes (input, prev_state) as input and passes the input through each internal module in the order they were presented, using elements from prev_state as necessary for internal RNN cores.

>>> deep_rnn = snt.DeepRNN([
...     snt.LSTM(hidden_size=16),
...     snt.LSTM(hidden_size=16),
... ])


Note that the state of a DeepRNN is always a tuple, which will contain the same number of elements as there are internal RNN cores. If no internal modules are RNN cores, the state of the DeepRNN as a whole is an empty tuple.

Wrapping non-recurrent modules into a DeepRNN can be useful to produce something API compatible with a “real” recurrent module, simplifying code that handles the cores.

__init__(layers, name: Optional[str] = None)[source]

Constructs a DeepRNN.

Parameters
sonnet.deep_rnn_with_skip_connections(layers: Sequence[sonnet.src.recurrent.RNNCore], concat_final_output: bool = True, name: str = 'deep_rnn_with_skip_connections') → sonnet.src.recurrent.RNNCore[source]

Constructs a DeepRNN with skip connections.

Skip connections alter the dependency structure within a DeepRNN. Specifically, input to the i-th layer (i > 0) is given by a concatenation of the core’s inputs and the outputs of the (i-1)-th layer.

outputs0, ... = layers[0](inputs, ...)
outputs1, ... = layers[1](tf.concat([inputs, outputs0], axis=1], ...)
outputs2, ... = layers[2](tf.concat([inputs, outputs1], axis=1], ...)
...


This allows the layers to learn decoupled features.

Parameters
• layers – A list of RNNCores.

• concat_final_output – If enabled (default), the outputs of the core is a concatenation of the outputs of all intermediate layers; otherwise, only the outputs of the final layer, i.e. that of layers[-1], are returned.

• name – Name of the module.

Returns

A DeepRNN with skip connections.

Raises

ValueError – If any of the layers is not an RNNCore.

sonnet.deep_rnn_with_residual_connections(layers: Sequence[sonnet.src.recurrent.RNNCore], name: str = 'deep_rnn_with_residual_connections') → sonnet.src.recurrent.RNNCore[source]

Constructs a DeepRNN with residual connections.

Residual connections alter the dependency structure in a DeepRNN. Specifically, the input to the i-th intermediate layer is a sum of the original core’s inputs and the outputs of all the preceding layers (<i).

outputs0, ... = layers[0](inputs, ...)
outputs0 += inputs
outputs1, ... = layers[1](outputs0, ...)
outputs1 += outputs0
outputs2, ... = layers[2](outputs1, ...)
outputs2 += outputs1
...


This allows the layers to learn specialized features that compose incrementally.

Parameters
Returns

A DeepRNN with residual connections.

Raises

ValueError – If any of the layers is not an RNNCore.

## LSTM¶

class sonnet.LSTM(hidden_size: int, projection_size: Optional[int] = None, projection_init: Optional[sonnet.src.initializers.Initializer] = None, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Long short-term memory (LSTM) RNN core.

The implementation is based on [2]. Given $$x_t$$ and the previous state $$(h_{t-1}, c_{t-1})$$ the core computes

$\begin{array}{ll} i_t = \sigma(W_{ii} x_t + W_{hi} h_{t-1} + b_i) \\ f_t = \sigma(W_{if} x_t + W_{hf} h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} x_t + W_{hg} h_{t-1} + b_g) \\ o_t = \sigma(W_{io} x_t + W_{ho} h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}$

Where $$i_t$$, $$f_t$$, $$o_t$$ are input, forget and output gate activations, and $$g_t$$ is a vector of cell updates.

Notes

Forget gate initialization:

Following [3] we add a constant forget_bias (defaults to 1.0) to $$b_f$$ after initialization in order to reduce the scale of forgetting in the beginning of the training.

Recurrent projections:

Hidden state could be projected (via the project_size parameter) to reduce the number of parameters and speed up computation. For more details see [4].

input_to_hidden

Input-to-hidden weights $$W_{ii}$$, $$W_{if}$$, $$W_{ig}$$ and $$W_{io}$$ concatenated into a tensor of shape [input_size, 4 * hidden_size].

hidden_to_hidden

Hidden-to-hidden weights $$W_{hi}$$, $$W_{hf}$$, $$W_{hg}$$ and $$W_{ho}$$ concatenated into a tensor of shape [hidden_size, 4 * hidden_size].

b

Biases $$b_i$$, $$b_f$$, $$b_g$$ and $$b_o$$ concatenated into a tensor of shape [4 * hidden_size].

__init__(hidden_size: int, projection_size: Optional[int] = None, projection_init: Optional[sonnet.src.initializers.Initializer] = None, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs an LSTM.

Parameters
• hidden_size – Hidden layer size.

• projection_size – Optional int; if set, then the hidden state is projected to this size via a trainable projection matrix.

• projection_init

Optional initializer for the projection matrix. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(hidden_size).

• w_i_init

Optional initializer for the input-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(input_size).

• w_h_init

Optional initializer for the hidden-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(hidden_size).

• b_init – Optional initializer for the biases. Defaults to Zeros.

• forget_bias – Optional float to add to the bias of the forget gate after initialization.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

__call__(inputs, prev_state)[source]

See base class.

initial_state(batch_size: int) → sonnet.src.recurrent.LSTMState[source]

See base class.

class sonnet.LSTMState(hidden, cell)

## lstm_with_recurrent_dropout¶

sonnet.lstm_with_recurrent_dropout(hidden_size, dropout=0.5, seed=None, **kwargs)[source]

Constructs an LSTM with recurrent dropout.

The implementation is based on [5]. Dropout is applied on the previous hidden state $$h_{t-1}$$ during the computation of gate activations:

$\begin{array}{ll} i_t = \sigma(W_{ii} x_t + W_{hi} d(h_{t-1}) + b_i) \\ f_t = \sigma(W_{if} x_t + W_{hf} d(h_{t-1}) + b_f) \\ g_t = \tanh(W_{ig} x_t + W_{hg} d(h_{t-1}) + b_g) \\ o_t = \sigma(W_{io} x_t + W_{ho} d(h_{t-1}) + b_o) \end{array}$
Parameters
Returns

An LSTM with recurrent dropout enabled for training. test_lstm: The same as train_lstm but without recurrent dropout.

Return type

train_lstm

Raises

ValueError – If dropout is not in [0, 1).

## UnrolledLSTM¶

class sonnet.UnrolledLSTM(hidden_size, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Unrolled long short-term memory (LSTM).

The implementation uses efficient device-specialized ops, e.g. CuDNN-RNN on a CUDA-enabled GPU, and can be an order of magnitude faster than snt.*_unroll with an LSTM core.

__init__(hidden_size, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Construct an unrolled LSTM.

Parameters
• hidden_size – Hidden layer size.

• w_i_init

Optional initializer for the input-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(input_size).

• w_h_init

Optional initializer for the hidden-to-hidden weights. Defaults to TruncatedNormal with a standard

deviation of 1 / sqrt(hidden_size).

• b_init – Optional initializer for the biases. Defaults to Zeros.

• forget_bias – Optional float to add to the bias of the forget gate after initialization.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

__call__(input_sequence, initial_state)[source]

See base class.

initial_state(batch_size)[source]

See base class.

## Conv1DLSTM¶

class sonnet.Conv1DLSTM(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format='NWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

1-D convolutional LSTM.

The implementation is based on [6]. Given $$x_t$$ and the previous state $$(h_{t-1}, c_{t-1})$$ the core computes

$\begin{array}{ll} i_t = \sigma(W_{ii} * x_t + W_{hi} * h_{t-1} + b_i) \\ f_t = \sigma(W_{if} * x_t + W_{hf} * h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} * x_t + W_{hg} * h_{t-1} + b_g) \\ o_t = \sigma(W_{io} * x_t + W_{ho} * h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}$

where $$*$$ denotes the convolution operator; $$i_t$$, $$f_t$$, $$o_t$$ are input, forget and output gate activations, and $$g_t$$ is a vector of cell updates.

Notes

Forget gate initialization:

Following [3] we add a constant forget_bias (defaults to 1.0) to $$b_f$$ after initialization in order to reduce the scale of forgetting in the beginning of the training.

input_to_hidden

Input-to-hidden convolution weights $$W_{ii}$$, $$W_{if}$$, $$W_{ig}$$ and $$W_{io}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 1 times.

hidden_to_hidden

Hidden-to-hidden convolution weights $$W_{hi}$$, $$W_{hf}$$, $$W_{hg}$$ and $$W_{ho}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 1 times.

b

Biases $$b_i$$, $$b_f$$, $$b_g$$ and $$b_o$$ concatenated into a tensor of shape [4 * output_channels].

__init__(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format='NWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs a 1-D convolutional LSTM.

Parameters
• input_shape – Shape of the inputs excluding batch size.

• output_channels – Number of output channels.

• kernel_shape – Sequence of kernel sizes (of length 1), or an int. kernel_shape will be expanded to define a kernel size in all dimensions.

• data_format – The data format of the input.

• w_i_init

Optional initializer for the input-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape * input_channels).

• w_h_init

Optional initializer for the hidden-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape * input_channels).

• b_init – Optional initializer for the biases. Defaults to Zeros.

• forget_bias – Optional float to add to the bias of the forget gate after initialization.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

## Conv2DLSTM¶

class sonnet.Conv2DLSTM(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format: str = 'NHWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

2-D convolutional LSTM.

The implementation is based on [6]. Given $$x_t$$ and the previous state $$(h_{t-1}, c_{t-1})$$ the core computes

$\begin{array}{ll} i_t = \sigma(W_{ii} * x_t + W_{hi} * h_{t-1} + b_i) \\ f_t = \sigma(W_{if} * x_t + W_{hf} * h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} * x_t + W_{hg} * h_{t-1} + b_g) \\ o_t = \sigma(W_{io} * x_t + W_{ho} * h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}$

where $$*$$ denotes the convolution operator; $$i_t$$, $$f_t$$, $$o_t$$ are input, forget and output gate activations, and $$g_t$$ is a vector of cell updates.

Notes

Forget gate initialization:

Following [3] we add a constant forget_bias (defaults to 1.0) to $$b_f$$ after initialization in order to reduce the scale of forgetting in the beginning of the training.

input_to_hidden

Input-to-hidden convolution weights $$W_{ii}$$, $$W_{if}$$, $$W_{ig}$$ and $$W_{io}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 2 times.

hidden_to_hidden

Hidden-to-hidden convolution weights $$W_{hi}$$, $$W_{hf}$$, $$W_{hg}$$ and $$W_{ho}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 2 times.

b

Biases $$b_i$$, $$b_f$$, $$b_g$$ and $$b_o$$ concatenated into a tensor of shape [4 * output_channels].

__init__(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format: str = 'NHWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs a 2-D convolutional LSTM.

Parameters
• input_shape – Shape of the inputs excluding batch size.

• output_channels – Number of output channels.

• kernel_shape – Sequence of kernel sizes (of length 2), or an int. kernel_shape will be expanded to define a kernel size in all dimensions.

• data_format – The data format of the input.

• w_i_init

Optional initializer for the input-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape**2 * input_channels).

• w_h_init

Optional initializer for the hidden-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape**2 * input_channels).

• b_init – Optional initializer for the biases. Defaults to Zeros.

• forget_bias – Optional float to add to the bias of the forget gate after initialization.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

## Conv3DLSTM¶

class sonnet.Conv3DLSTM(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format: str = 'NDHWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

3-D convolutional LSTM.

The implementation is based on [6]. Given $$x_t$$ and the previous state $$(h_{t-1}, c_{t-1})$$ the core computes

$\begin{array}{ll} i_t = \sigma(W_{ii} * x_t + W_{hi} * h_{t-1} + b_i) \\ f_t = \sigma(W_{if} * x_t + W_{hf} * h_{t-1} + b_f) \\ g_t = \tanh(W_{ig} * x_t + W_{hg} * h_{t-1} + b_g) \\ o_t = \sigma(W_{io} * x_t + W_{ho} * h_{t-1} + b_o) \\ c_t = f_t c_{t-1} + i_t g_t \\ h_t = o_t \tanh(c_t) \end{array}$

where $$*$$ denotes the convolution operator; $$i_t$$, $$f_t$$, $$o_t$$ are input, forget and output gate activations, and $$g_t$$ is a vector of cell updates.

Notes

Forget gate initialization:

Following [3] we add a constant forget_bias (defaults to 1.0) to $$b_f$$ after initialization in order to reduce the scale of forgetting in the beginning of the training.

input_to_hidden

Input-to-hidden convolution weights $$W_{ii}$$, $$W_{if}$$, $$W_{ig}$$ and $$W_{io}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 3 times.

hidden_to_hidden

Hidden-to-hidden convolution weights $$W_{hi}$$, $$W_{hf}$$, $$W_{hg}$$ and $$W_{ho}$$ concatenated into a single tensor of shape [kernel_shape*, input_channels, 4 * output_channels] where kernel_shape is repeated 3 times.

b

Biases $$b_i$$, $$b_f$$, $$b_g$$ and $$b_o$$ concatenated into a tensor of shape [4 * output_channels].

__init__(input_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], output_channels: int, kernel_shape: Union[int, Sequence[int]], data_format: str = 'NDHWC', w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, forget_bias: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs a 3-D convolutional LSTM.

Parameters
• input_shape – Shape of the inputs excluding batch size.

• output_channels – Number of output channels.

• kernel_shape – Sequence of kernel sizes (of length 3), or an int. kernel_shape will be expanded to define a kernel size in all dimensions.

• data_format – The data format of the input.

• w_i_init

Optional initializer for the input-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape**3 * input_channels).

• w_h_init

Optional initializer for the hidden-to-hidden convolution weights. Defaults to TruncatedNormal with a

standard deviation of 1 / sqrt(kernel_shape**3 * input_channels).

• b_init – Optional initializer for the biases. Defaults to Zeros.

• forget_bias – Optional float to add to the bias of the forget gate after initialization.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

## GRU¶

class sonnet.GRU(hidden_size, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Gated recurrent unit (GRU) RNN core.

The implementation is based on [7]. Given $$x_t$$ and the previous state $$h_{t-1}$$ the core computes

$\begin{array}{ll} z_t &= \sigma(W_{iz} x_t + W_{hz} h_{t-1} + b_z) \\ r_t &= \sigma(W_{ir} x_t + W_{hr} h_{t-1} + b_r) \\ a_t &= \tanh(W_{ia} x_t + W_{ha} (r_t h_{t-1}) + b_a) \\ h_t &= (1 - z_t) h_{t-1} + z_t a_t \end{array}$

where $$z_t$$ and $$r_t$$ are reset and update gates.

input_to_hidden

Input-to-hidden weights $$W_{iz}$$, $$W_{ir}$$ and $$W_{ia}$$ concatenated into a tensor of shape [input_size, 3 *

hidden_size].

hidden_to_hidden

Hidden-to-hidden weights $$W_{hz}$$, $$W_{hr}$$ and $$W_{ha}$$ concatenated into a tensor of shape [hidden_size, 3 *

hidden_size].

b

Biases $$b_z$$, $$b_r$$ and $$b_a$$ concatenated into a tensor of shape [3 * hidden_size].

__init__(hidden_size, w_i_init: Optional[sonnet.src.initializers.Initializer] = None, w_h_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs a GRU.

Parameters
• hidden_size – Hidden layer size.

• w_i_init – Optional initializer for the input-to-hidden weights. Defaults to Glorot uniform initializer.

• w_h_init – Optional initializer for the hidden-to-hidden weights. Defaults to Glorot uniform initializer.

• b_init – Optional initializer for the biases. Defaults to Zeros.

• dtype – Optional tf.DType of the core’s variables. Defaults to tf.float32.

• name – Name of the module.

__call__(inputs, prev_state)[source]

See base class.

initial_state(batch_size)[source]

See base class.

# Batch¶

## reshape¶

sonnet.reshape(inputs: tensorflow.python.framework.ops.Tensor, output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], preserve_dims: int = 1, name: Optional[str] = None) → tensorflow.python.framework.ops.Tensor[source]

A shortcut for applying Reshape to the inputs.

## Reshape¶

class sonnet.Reshape(output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], preserve_dims: int = 1, name: Optional[str] = None)[source]

Reshapes input Tensor, preserving the batch dimension.

For example, given an input tensor with shape [B, H, W, C, D]:

>>> B, H, W, C, D = range(1, 6)
>>> x = tf.ones([B, H, W, C, D])


The default behavior when output_shape is (-1, D) is to flatten all dimensions between B and D:

>>> mod = snt.Reshape(output_shape=(-1, D))
>>> assert mod(x).shape == [B, H*W*C, D]


You can change the number of preserved leading dimensions via preserve_dims:

>>> mod = snt.Reshape(output_shape=(-1, D), preserve_dims=2)
>>> assert mod(x).shape == [B, H, W*C, D]

>>> mod = snt.Reshape(output_shape=(-1, D), preserve_dims=3)
>>> assert mod(x).shape == [B, H, W, C, D]

>>> mod = snt.Reshape(output_shape=(-1, D), preserve_dims=4)
>>> assert mod(x).shape == [B, H, W, C, 1, D]

__init__(output_shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], preserve_dims: int = 1, name: Optional[str] = None)[source]

Constructs a Reshape module.

Parameters
• output_shape – Shape to reshape the input tensor to while preserving its first preserve_dims dimensions. When the special value -1 appears in output_shape the corresponding size is automatically inferred. Note that -1 can only appear once in output_shape. To flatten all non-batch dimensions use Flatten.

• preserve_dims – Number of leading dimensions that will not be reshaped.

• name – Name of the module.

Raises

ValueError – If preserve_dims is not positive.

__call__(inputs: tensorflow.python.framework.ops.Tensor) → tensorflow.python.framework.ops.Tensor[source]

Reshapes inputs.

Parameters

inputs – A tensor of shape [b_1, b_2, ..., b_preserve_dims, b_preserve_dims + 1, ...].

Returns

A tensor of shape

[b_1, b_2, ..., b_preserve_dims, b_reshape_1, b_reshape_2, ...], with reshaping defined by the constructor output_shape parameter.

Raises

ValueError – If output_shape is incompatible with shape of the inputs; or if output_shape contains more than one wildcard -1; or if the inputs rank is less than preserved_dims; or if the inputs shape contains unknown, non-preserved dimensions (except when the unknown dimension is the only non-preserved dimension and doesn’t actually need reshaping).

reversed(name: Optional[str] = None) → sonnet.src.reshape.Reshape[source]

Returns inverse batch reshape.

## flatten¶

sonnet.flatten(inputs: tensorflow.python.framework.ops.Tensor, name: str = 'flatten') → tensorflow.python.framework.ops.Tensor[source]

A shortcut for applying Flatten to the inputs.

## Flatten¶

class sonnet.Flatten(preserve_dims: int = 1, name: Optional[str] = None)[source]

Flattens the input Tensor, preserving the batch dimension(s).

Flatten reshapes input tensors to combine all trailing dimensions apart from the first. Additional leading dimensions can be preserved by setting the preserve_dims parameter.

See Reshape for more details.

__init__(preserve_dims: int = 1, name: Optional[str] = None)[source]

Constructs a Flatten module.

Parameters
• preserve_dims – Number of leading dimensions that will not be reshaped.

• name – Name of the module.

## BatchApply¶

class sonnet.BatchApply(module: sonnet.src.base.Module, num_dims: int = 2, name: Optional[str] = None)[source]

Merges a number of leading dimensions of an input tensor to manipulate it.

Merges a number of leading dimensions of a tensor into a single dimension, connects the provided module, then splits the leading dimension of the result to match the input.

Input tensors whose rank is smaller than the number of dimensions to collapse (e.g. all scalar values, which are tensors of rank 0), are passed unaltered to the provided module.

This is useful for applying some module to each timestep of a Time x Batch x N tensor. If a module is hard coded to only support 2D (Batch x N) then the full 3D Tensor cannot be provided. BatchApply will ‘merge’ the first two dimensions of the sequence tensor by reshaping to a (Time * Batch) x N Tensor, and then the internal module can be applied. The result of that operation is reshaped such that its first dimensions are split to match the leading dimensions of the input.

__init__(module: sonnet.src.base.Module, num_dims: int = 2, name: Optional[str] = None)[source]

Initializes the current module with the given name.

Subclasses should call this constructor before creating other modules or variables such that those modules are named correctly.

Parameters

name – An optional string name for the class. Must be a valid Python identifier. If name is not provided then the class name for the current instance is converted to lower_snake_case and used instead.

__call__(*args, **kwargs)[source]

Call self as a function.

# Embedding modules¶

## Embed¶

class sonnet.Embed(vocab_size: Optional[int] = None, embed_dim: Optional[int] = None, existing_vocab: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, None] = None, densify_gradients: bool = False, initializer: Optional[sonnet.src.initializers.Initializer] = None, trainable: bool = True, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Module for embedding tokens in a low-dimensional space.

__init__(vocab_size: Optional[int] = None, embed_dim: Optional[int] = None, existing_vocab: Union[numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable, None] = None, densify_gradients: bool = False, initializer: Optional[sonnet.src.initializers.Initializer] = None, trainable: bool = True, dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: Optional[str] = None)[source]

Constructs an Embed module.

Parameters
• vocab_size – Number of unique tokens to embed. If not provided, an existing vocabulary matrix from which vocab_size can be inferred must be provided as existing_vocab.

• embed_dim – Number of dimensions to assign to each embedding. If not specified, we use 6 * sqrt(sqrt(vocab_size)). If an existing vocabulary matrix initializes the module, this should not be provided as it will be inferred.

• existing_vocab – A [vocab_size, embed_dim] vocabulary matrix. Will be converted to a tf.float32 tensor. If provided, neither or vocab_size or embed_dim should be provided as they are inferred.

• densify_gradients – If True, we convert the embedding gradient from an tf.IndexedSlices to a regular tensor before sending it back to the parameter server. This avoids excess computation on the parameter server. Use this option for moderately sized embeddings, e.g., a vocabulary size on the order of up to thousands. For embeddings larger than these, e.g. a vocabulary size on the order of tens or hundreds of thousands, set this to False.

• initializer – Initializer for the embeddings. By default, embeddings are initialized via a truncated normal distribution.

• trainable – if True, the embeddings will be updated during training. If False, they are fixed to their initial values.

• dtype – The dtype to use for the embedding. Defaults to float32.

• name – Name for this module.

Raises

ValueError – if neither one of vocab_size or existing_vocab is provided, or if existing_vocab is provided along with vocab_size, embedding_dim, initializer (as these should be inferred).

__call__(inputs)[source]

Call self as a function.

# Optimizers¶

Sonnet optimizers built for TensorFlow 2.

All optimizers implement the snt.Optimizer interface.

## Optimizer¶

class sonnet.optimizers.Adam(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.001, beta1: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.9, beta2: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.999, epsilon: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-08, name: Optional[str] = None)[source]

Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. See [8] for more details.

Note: default parameter values have been taken from the paper.

learning_rate

Step size (alpha in the paper).

beta1

Exponential decay rate for first moment estimate.

beta2

Exponential decay rate for second moment estimate.

epsilon

Small value to avoid zero denominator.

step

Step count.

m

Biased first moment estimate (a list with one value per parameter).

v

Biased second raw moment estimate (a list with one value per parameter).

__init__(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.001, beta1: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.9, beta2: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.999, epsilon: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-08, name: Optional[str] = None)[source]

Parameters
• learning_rate – Step size (alpha in the paper).

• beta1 – Exponential decay rate for first moment estimate.

• beta2 – Exponential decay rate for second moment estimate.

• epsilon – Small value to avoid zero denominator.

• name – Name of the module.

apply(updates: Sequence[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.indexed_slices.IndexedSlices, None]], parameters: Sequence[tensorflow.python.ops.variables.Variable])[source]

Applies the Adam update rule for each update, parameter pair:

$\begin{array}{ll} m_t = \beta_1 \cdot m_{t-1} + (1 - \beta_1) \cdot update \\ v_t = \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot update^2 \\ \hat{m}_t = m_t / (1 - \beta_1^t) \\ \hat{v}_t = v_t / (1 - \beta_2^t) \\ delta = \alpha \cdot \hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon) \\ param_t = param_{t-1} - delta \\ \end{array}$
Parameters
Raises

ValueError – If updates and parameters are empty, have different lengths, or have inconsistent types.

## Momentum¶

class sonnet.optimizers.Momentum(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], momentum: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], use_nesterov: bool = False, name: Optional[str] = None)[source]

SGD with Momentum module.

learning_rate

Learning rate.

momentum

Momentum scalar.

use_nesterov

True if using Nesterov momentum.

accumulated_momentum

Accumulated momentum for each parameter.

__init__(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], momentum: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], use_nesterov: bool = False, name: Optional[str] = None)[source]

Constructs a Momentum module.

Parameters
• learning_rate – Learning rate.

• momentum – Momentum scalar.

• use_nesterov – Whether to use Nesterov momentum.

• name – Name of the module.

apply(updates: Sequence[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.indexed_slices.IndexedSlices, None]], parameters: Sequence[tensorflow.python.ops.variables.Variable])[source]

By default it applies the momentum update rule for each update, parameter pair:

accum_t <- momentum * accum_{t-1} + update parameter <- parameter - learning_rate * accum_t

And when using Nesterov momentum (use_nesterov=True) it applies:

accum_t <- momentum * accum_{t-1} + update parameter <- parameter - (learning_rate * update +

learning_rate * momentum * accum_t)

Parameters

• parameters – A list of parameters. A parameter is a tf.Variable.

Raises

ValueError – If updates and parameters are empty, have different lengths, or have inconsistent types.

## RMSProp¶

class sonnet.optimizers.RMSProp(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], decay: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.9, momentum: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, epsilon: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-10, centered: bool = False, name: Optional[str] = None)[source]

RMSProp module.

Maintain a moving (discounted) average of the square of updates. Divides each update by the root of this average.

ms <- decay * ms + (1-decay) * update^2 mom <- momentum * mom + learning_rate * update / sqrt(ms + epsilon) parameter <- parameter - mom

This implementation of RMSprop uses plain momentum, not Nesterov momentum.

The centered version additionally maintains a moving average of the gradients, and uses that average to estimate the variance:

mg <- decay * mg + (1-decay) * update ms <- decay * ms + (1-decay) * update^2 mom <- momentum * mom + learning_rate * update / sqrt(ms - mg^2 + epsilon) parameter <- parameter - mom

learning_rate

Learning rate.

decay

Learning rate decay over each update.

momentum

Momentum scalar.

epsilon

Small value to avoid zero denominator.

centered

True if centered.

mom

Accumulated mom for each parameter.

ms

Accumulated ms for each parameter.

mg

Accumulated mg for each parameter.

__init__(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], decay: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.9, momentum: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, epsilon: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1e-10, centered: bool = False, name: Optional[str] = None)[source]

Constructs an RMSProp module.

Parameters
• learning_rate – Learning rate.

• decay – Learning rate decay over each update.

• momentum – Momentum scalar.

• epsilon – Small value to avoid zero denominator.

• centered – If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.

• name – Name for this module.

apply(updates: Sequence[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.indexed_slices.IndexedSlices, None]], parameters: Sequence[tensorflow.python.ops.variables.Variable])[source]

Parameters

• parameters – A list of parameters.

Raises

ValueError – If updates and parameters are empty, have different lengths, or have inconsistent types.

## SGD¶

class sonnet.optimizers.SGD(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], name: Optional[str] = None)[source]

learning_rate

Learning rate.

__init__(learning_rate: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], name: Optional[str] = None)[source]

Constructs an SGD module.

Parameters
• learning_rate – Learning rate.

• name – Name of the module.

apply(updates: Sequence[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.indexed_slices.IndexedSlices, None]], parameters: Sequence[tensorflow.python.ops.variables.Variable])[source]

Parameters

• parameters – A list of parameters.

Raises

ValueError – If updates and parameters are empty, have different lengths, or have inconsistent types.

# Initializers¶

Initializers.

## Initializer¶

class sonnet.initializers.Initializer[source]

Initializer base class, all initializers must implement a call method.

abstract __call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## Constant¶

class sonnet.initializers.Constant(value: Union[float, int])[source]

Initializer that generates tensors initialized to the given value.

__init__(value: Union[float, int])[source]

Initialize self. See help(type(self)) for accurate signature.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## Identity¶

class sonnet.initializers.Identity(gain: float = 1.0)[source]

Initializer that generates the identity matrix.

Constructs a 2D identity matrix or batches of these.

__init__(gain: float = 1.0)[source]

Constructs an identity initializer.

Parameters

gain – Multiplicative factor to apply to the identity matrix.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## Ones¶

class sonnet.initializers.Ones[source]

Initializer that generates tensors initialized to 1.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## Orthogonal¶

class sonnet.initializers.Orthogonal(gain: float = 1.0, seed: Optional[int] = None)[source]

Initializer that generates an orthogonal matrix.

NOTE: Does not support 1D tensors.

The implementation is based on [9].

If the shape of the tensor to initialize is two-dimensional, it is initialized with an orthogonal matrix obtained from the QR decomposition of a matrix of random numbers drawn from a normal distribution. If the matrix has fewer rows than columns then the output will have orthogonal rows. Otherwise, the output will have orthogonal columns.

If the shape of the tensor to initialize is more than two-dimensional, a matrix of shape (shape[0] * ... * shape[n - 2], shape[n - 1]) is initialized, where n is the length of the shape vector. The matrix is subsequently reshaped to give a tensor of the desired shape.

__init__(gain: float = 1.0, seed: Optional[int] = None)[source]

Constructs an orthogonal initializer.

Parameters
• gain – Multiplicative factor to apply to the orthogonal matrix

• seedint, the seed used in the generation of random numbers.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## RandomNormal¶

class sonnet.initializers.RandomNormal(mean: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, stddev: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, seed: Optional[int] = None)[source]

Initializer that generates tensors with a normal distribution.

__init__(mean: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, stddev: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, seed: Optional[int] = None)[source]

Constructs a random normal initializer.

Parameters
• mean – A python scalar or a scalar tensor. Mean of the random values to generate.

• stddev – A python scalar or a scalar tensor. Standard deviation of the random values to generate.

• seed – The seed used in the generation of random numbers.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## RandomUniform¶

class sonnet.initializers.RandomUniform(minval: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0, maxval: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1, seed: Optional[int] = None)[source]

Initializer that generates tensors with a uniform distribution.

The generated values follow a uniform distribution in the range [minval, maxval).

__init__(minval: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0, maxval: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1, seed: Optional[int] = None)[source]

Constructs a random uniform initializer.

Parameters
• minval – A python scalar or a scalar tensor. Lower bound of the range of random values to generate. Defaults to 0.

• maxval – A python scalar or a scalar tensor. Upper bound of the range of random values to generate. Defaults to 1.

• seed – The seed used in the generation of random numbers.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType)[source]

Returns a tensor of the given shape and dtype.

## TruncatedNormal¶

class sonnet.initializers.TruncatedNormal(mean: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, stddev: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, seed: Optional[int] = None)[source]

Initializer that generates a truncated normal distribution.

These values follow a normal distribution except that values more than two standard deviations from the mean are discarded and re-drawn. This is the recommended initializer for neural network weights and filters.

__init__(mean: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 0.0, stddev: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable] = 1.0, seed: Optional[int] = None)[source]

Constructs a truncated normal initializer.

Parameters
• mean – A python scalar or a scalar tensor. Mean of the random values to generate.

• stddev – A python scalar or a scalar tensor. Standard deviation of the random values to generate.

• seed – The seed used in the generation of random numbers.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType)[source]

Returns a tensor of the given shape and dtype.

## VarianceScaling¶

class sonnet.initializers.VarianceScaling(scale: float = 1.0, mode: str = 'fan_in', distribution: str = 'truncated_normal', seed: Optional[int] = None)[source]

Initializer capable of adapting its scale to the shape of weights tensors.

With distribution="truncated_normal" or "normal", samples are drawn from a distribution with a mean of zero and a standard deviation (after truncation, if used) stddev = sqrt(scale / n) where n is:

• Number of input units in the weight tensor, if mode = fan_in.

• Number of output units, if mode = fan_out.

• Average of the numbers of input and output units, if mode = fan_avg.

Note that for transposed convolution the mode selected should be reversed. For number of input units use fan_out and for number of output units fan_in.

With distribution=uniform, samples are drawn from a uniform distribution within [-limit, limit], with limit = sqrt(3 * scale / n).

The variance scaling initializer can be configured to generate other standard initializers using the scale, mode and distribution arguments. Here are some example configurations:

Name

Parameters

glorot_uniform

scale=1.0, mode=fan_avg, distribution=uniform

glorot_normal

scale=1.0, mode=fan_avg, distribution=truncated_normal

lecun_uniform

scale=1.0, mode=fan_in, distribution=uniform

lecun_normal

scale=1.0, mode=fan_in, distribution=truncated_normal

he_uniform

scale=2.0, mode=fan_in, distribution=uniform

he_normal

scale=2.0, mode=fan_in, distribution=truncated_normal

__init__(scale: float = 1.0, mode: str = 'fan_in', distribution: str = 'truncated_normal', seed: Optional[int] = None)[source]

Constructs a variance scaling initalizer.

Parameters
• scale – Scaling factor (positive float).

• mode – One of fan_in, fan_out, fan_avg.

• distribution – Random distribution to use. One of truncated_normal, untruncated_normal and uniform.

• seedint, the seed used in the generation of random numbers.

Raises

ValueError – In case of an invalid value for the scale, mode or distribution arguments.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

## Zeros¶

class sonnet.initializers.Zeros[source]

Initializer that generates tensors initialized to 0.

__call__(shape: Union[int, Sequence[int], tensorflow.python.framework.tensor_shape.TensorShape], dtype: tensorflow.python.framework.dtypes.DType) → tensorflow.python.framework.ops.Tensor[source]

Returns a tensor of the given shape and dtype.

# Regularizers¶

Regularizers.

## Regularizer¶

class sonnet.regularizers.Regularizer[source]

Base regularizer class.

abstract __call__(tensors: Sequence[tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor[source]

Apply a regularizer.

Parameters

tensors – A sequence of tensors to regularize.

Returns

Combined regularization loss for the given tensors.

## L1¶

class sonnet.regularizers.L1(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

L1 regularizer.

>>> reg = snt.regularizers.L1(0.01)
>>> reg([tf.constant([1.0, 2.0, 3.0])])
<tf.Tensor: ...>

__init__(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

Create an L1 regularizer.

Parameters

scale – A non-negative regularization factor.

Raises

ValueError – if scale is <0.

__call__(tensors: Sequence[tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor[source]

See base class.

## L2¶

class sonnet.regularizers.L2(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

L2 regularizer.

>>> reg = snt.regularizers.L2(0.01)
>>> reg([tf.constant([1.0, 2.0, 3.0])])
<tf.Tensor: ...>

__init__(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

Create an L2 regularizer.

Parameters

scale – float or scalar tensor; regularization factor.

Raises

ValueError – if scale is <0.

__call__(tensors: Sequence[tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor[source]

See base class.

## OffDiagonalOrthogonal¶

class sonnet.regularizers.OffDiagonalOrthogonal(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

Off-diagonal orthogonal regularizer.

The implementation is based on https://arxiv.org/abs/1809.11096. Given a rank N >= 2 tensor, the regularizer computes the sum of off-diagonal entries of (W^T W)^2 where

• W is the input tensor reshaped to a matrix by collapsing the leading N - 1 axes into the first one;

• ^2 is the element-wise square.

NB: that is equivalent to computing the off-diagonal sum of (W^T W - I)^2, as off-diagonal entries of I are 0.

For example,

>>> t = tf.reshape(tf.range(8, dtype=tf.float32), [2, 2, 2])
>>> reg = snt.regularizers.OffDiagonalOrthogonal(0.01)
>>> reg([t])
<tf.Tensor: ...>


corresponds to copmuting

>>> w = tf.reshape(t, [-1, 2])
>>> w_gram_sq = tf.square(tf.matmul(tf.transpose(w), w))
>>> 0.01 * (tf.reduce_sum(w_gram_sq) - tf.linalg.trace(w_gram_sq))
<tf.Tensor: ...>

__init__(scale: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable])[source]

Create an off-diagonal orthogonal regularizer.

Parameters

scale – A non-negative regularization factor.

Raises

ValueError – if scale is <0.

__call__(tensors: Sequence[tensorflow.python.framework.ops.Tensor]) → tensorflow.python.framework.ops.Tensor[source]

See base class.

## causal¶

sonnet.pad.causal(effective_kernel_size: int)[source]

Pre-padding such that output has no dependence on the future.

## create¶

sonnet.pad.create(padding: Union[Callable[[int], Sequence[int]], Sequence[Callable[[int], Sequence[int]]]], kernel: Union[int, Sequence[int]], rate: Union[int, Sequence[int]], n: int, channel_index: int)[source]

Parameters
• padding – callable or list of callables of length n. The callables take an integer representing the effective kernel size (kernel size when the rate is 1) and return a list of two integers representing the padding before and padding after for that dimension.

• kernel – int or list of ints of length n. The size of the kernel for each dimension. If it is an int it will be replicated for the non channel and batch dimensions.

• rate – int or list of ints of length n. The dilation rate for each dimension. If it is an int it will be replicated for the non channel and batch dimensions.

• n – the number of spatial dimensions.

• channel_index – the channel position of the input to which the padding will be applied.

Returns

A list of length n+2 containing the padding for each element. These are of the form [pad_before, pad_after].

## full¶

sonnet.pad.full(effective_kernel_size: int)[source]

## reverse_causal¶

sonnet.pad.reverse_causal(effective_kernel_size: int)[source]

Post-padding such that output has no dependence on the past.

## same¶

sonnet.pad.same(effective_kernel_size: int)[source]

Pads such that the output size matches input size for stride=1.

## valid¶

sonnet.pad.valid(effective_kernel_size: int)[source]

# Distribution¶

Utilities for using Sonnet with TensorFlow Distribution Strategy.

## Replicator¶

class sonnet.distribute.Replicator(devices=None, cross_device_ops=None)[source]

Replicates input, parameters and compute over multiple accelerators.

Replicator is a TensorFlow “Distribution Strategy” implementing the programming model described in the TF-Replicator paper [10] and TensorFlow RFC [11]. Replicator enables data-parallel training across multiple accelerators on a single machine, it supports eager execution and tf.function.

To get started create a Replicator instance:

>>> replicator = snt.distribute.Replicator()


Replicator provides a scope inside which any new tf.Variables will be replicated across all local devices:

>>> with replicator.scope():
...    mod = snt.Linear(32)


Additionally replicator provides utility functions to apply a module in parallel on multiple devices. First we need to define some computation that runs on each GPU. The “replica context” object provides us a way to communicate between replicas (e.g. to perform an all_reduce):

>>> def forward():
...   # Compute a random output on each GPU.
...   x = tf.random.normal([8, 28 * 28])
...   y = mod(x)
...   # Synchronize the value of y between all GPUs.
...   ctx = tf.distribute.get_replica_context()
...   y = ctx.all_reduce("mean", y)
...   return y


Finally we use the run API to apply forward in parallel on all accelerator devices:

>>> per_replica_y = replicator.experimental_run_v2(forward)

scope()[source]

Returns a context manager selecting this Strategy as current.

Inside a with strategy.scope(): code block, this thread will use a variable creator set by strategy, and will enter its “cross-replica context”.

Returns

A context manager.

## TpuReplicator¶

class sonnet.distribute.TpuReplicator(tpu_cluster_resolver=None, device_assignment=None)[source]

Replicates input, parameters and compute over multiple TPUs.

TpuReplicator is a TensorFlow “Distribution Strategy” implementing the programming model described in the TF-Replicator paper [10] and TensorFlow RFC [11]. TpuReplicator enables data-parallel training across multiple TPUs on one or more machines, it supports tf.function.

To get started create a TpuReplicator instance:

>>> replicator = snt.distribute.TpuReplicator()


This provides a scope inside which any new tf.Variables will be replicated across all TPU cores:

>>> with replicator.scope():
...    mod = snt.Linear(32)


Additionally replicator provides utility functions to apply a module in parallel on multiple devices. First we need to define some computation that runs on each TPU. The “replica context” object provides us a way to communicate between replicas:

>>> def forward():
...   # Compute a random output on each GPU.
...   x = tf.random.normal([8, 28 * 28])
...   y = mod(x)
...   # Synchronize the value of y between all GPUs.
...   ctx = tf.distribute.get_replica_context()
...   y = ctx.all_reduce("mean", y)
...   return y


Finally we use the run API to apply forward in parallel on all TPU devices. This must be run as part of a tf.function since TpuReplicator uses XLA to compile and replicate our function to run in parallel over all TPU cores:

>>> @tf.function(autograph=False)
... def all_forward():
...   return replicator.experimental_run_v2(forward)
>>> per_replica_y = all_forward()

scope()[source]

Returns a context manager selecting this Strategy as current.

Inside a with strategy.scope(): code block, this thread will use a variable creator set by strategy, and will enter its “cross-replica context”.

Returns

A context manager.

# Metrics¶

## Metric¶

class sonnet.Metric(name: Optional[str] = None)[source]

Metric base class.

abstract initialize(value)[source]

Creates any input dependent variables or state.

abstract update(value)[source]

Accumulates values.

property value

Returns the current value of the metric.

abstract reset()[source]

Resets the metric.

__call__(value)[source]

Updates the metric and returns the new value.

## Mean¶

class sonnet.Mean(name: Optional[str] = None)[source]

Calculates the element-wise mean of the given values.

__init__(name: Optional[str] = None)[source]

Initializes the current module with the given name.

Subclasses should call this constructor before creating other modules or variables such that those modules are named correctly.

Parameters

name – An optional string name for the class. Must be a valid Python identifier. If name is not provided then the class name for the current instance is converted to lower_snake_case and used instead.

initialize(value: tensorflow.python.framework.ops.Tensor)[source]

See base class.

update(value: tensorflow.python.framework.ops.Tensor)[source]

See base class.

property value

See base class.

reset()[source]

Resets the metric.

## Sum¶

class sonnet.Sum(name: Optional[str] = None)[source]

Calculates the element-wise sum of the given values.

__init__(name: Optional[str] = None)[source]

Initializes the current module with the given name.

Subclasses should call this constructor before creating other modules or variables such that those modules are named correctly.

Parameters

name – An optional string name for the class. Must be a valid Python identifier. If name is not provided then the class name for the current instance is converted to lower_snake_case and used instead.

initialize(value: tensorflow.python.framework.ops.Tensor)[source]

See base class.

update(value: tensorflow.python.framework.ops.Tensor)[source]

See base class.

property value

See base class.

reset()[source]

See base class.

# Nets¶

Common network architectures implemented as Sonnet modules.

## MLP¶

class sonnet.nets.MLP(output_sizes: Iterable[int], w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, with_bias: bool = True, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_rate=None, activate_final: bool = False, name: Optional[str] = None)[source]

A multi-layer perceptron module.

__init__(output_sizes: Iterable[int], w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, with_bias: bool = True, activation: Callable[[tensorflow.python.framework.ops.Tensor], tensorflow.python.framework.ops.Tensor] = <function relu>, dropout_rate=None, activate_final: bool = False, name: Optional[str] = None)[source]

Constructs an MLP.

Parameters
• output_sizes – Sequence of layer sizes.

• w_init – Initializer for Linear weights.

• b_init – Initializer for Linear bias. Must be None if with_bias is False.

• with_bias – Whether or not to apply a bias in each layer.

• activation – Activation function to apply between linear layers. Defaults to ReLU.

• dropout_rate – Dropout rate to apply, a rate of None (the default) or 0 means no dropout will be applied.

• activate_final – Whether or not to activate the final layer of the MLP.

• name – Optional name for this module.

Raises

ValueError – If with_bias is False and b_init is not None.

__call__(inputs: tensorflow.python.framework.ops.Tensor, is_training=None) → tensorflow.python.framework.ops.Tensor[source]

Connects the module to some inputs.

Parameters
• inputs – A Tensor of shape [batch_size, input_size].

• is_training – A bool indicating if we are currently training. Defaults to None. Required if using dropout.

Returns

The output of the model of size [batch_size, output_size].

Return type

output

reverse(activate_final: Optional[bool] = None, name: Optional[str] = None) → MLP[source]

Returns a new MLP which is the layer-wise reverse of this MLP.

NOTE: Since computing the reverse of an MLP requires knowing the input size of each linear layer this method will fail if the module has not been called at least once. See snt.Deferred as a possible solution to this problem.

The contract of reverse is that the reversed module will accept the output of the parent module as input and produce an output which is the input size of the parent.

>>> mlp = snt.nets.MLP([1, 2, 3])
>>> y = mlp(tf.ones([1, 2]))
>>> rev = mlp.reverse()
>>> rev(y)
<tf.Tensor: ... shape=(1, 2), ...>

Parameters
• activate_final – Whether the final layer of the MLP should be activated.

• name – Optional name for the new module. The default name will be the name of the current module prefixed with "reversed_".

Returns

An MLP instance which is the reverse of the current instance. Note these instances do not share weights and, apart from being symmetric to each other, are not coupled in any way.

## Cifar10ConvNet¶

class sonnet.nets.Cifar10ConvNet(num_classes: int = 10, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', output_channels: Sequence[int] = (64, 64, 128, 128, 128, 256, 256, 256, 512, 512, 512), strides: Sequence[int] = (1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1), name: Optional[str] = None)[source]

Convolutional network designed for Cifar10.

Approximately equivalent to “VGG, minus max pooling, plus BatchNorm”. For best results the input data should be scaled to be between -1 and 1 when using the standard initializers.

__init__(num_classes: int = 10, w_init: Optional[sonnet.src.initializers.Initializer] = None, b_init: Optional[sonnet.src.initializers.Initializer] = None, data_format: str = 'NHWC', output_channels: Sequence[int] = (64, 64, 128, 128, 128, 256, 256, 256, 512, 512, 512), strides: Sequence[int] = (1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1), name: Optional[str] = None)[source]

Initializes the current module with the given name.

Subclasses should call this constructor before creating other modules or variables such that those modules are named correctly.

Parameters

name – An optional string name for the class. Must be a valid Python identifier. If name is not provided then the class name for the current instance is converted to lower_snake_case and used instead.

__call__(inputs: tensorflow.python.framework.ops.Tensor, is_training: Union[bool, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], test_local_stats: bool = True) → Mapping[str, tensorflow.python.framework.ops.Tensor][source]

Connects the module to some inputs.

Parameters
• inputs – A Tensor of size [batch_size, input_height, input_width, input_channels], representing a batch of input images.

• is_training – Boolean to indicate to snt.BatchNorm if we are currently training.

• test_local_stats – Boolean to indicate to snt.BatchNorm if batch normalization should use local batch statistics at test time. By default True.

Returns

• logits: The output logits of the network, this will be of size [batch_size, num_classes]

• activations: A list of tf.Tensor, the feature activations of the module. The order of the activations is preserved in the output list. The activations in the output list are those computed after the activation function is applied, if one is applied at that layer.

Return type

A dictionary containing two items

## ResNet¶

class sonnet.nets.ResNet(blocks_per_group_list: Sequence[int], num_classes: int, bn_config: Optional[Mapping[str, float]] = None, resnet_v2: bool = False, channels_per_group_list: Sequence[int] = (256, 512, 1024, 2048), name: Optional[str] = None)[source]

ResNet model.

__init__(blocks_per_group_list: Sequence[int], num_classes: int, bn_config: Optional[Mapping[str, float]] = None, resnet_v2: bool = False, channels_per_group_list: Sequence[int] = (256, 512, 1024, 2048), name: Optional[str] = None)[source]

Constructs a ResNet model.

Parameters
• blocks_per_group_list – A sequence of length 4 that indicates the number of blocks created in each group.

• num_classes – The number of classes to classify the inputs into.

• bn_config – A dictionary of two elements, decay_rate and eps to be passed on to the BatchNorm layers. By default the decay_rate is 0.9 and eps is 1e-5.

• resnet_v2 – Whether to use the v1 or v2 ResNet implementation. Defaults to False.

• channels_per_group_list – A sequence of length 4 that indicates the number of channels used for each block in each group.

• name – Name of the module.

__call__(inputs, is_training)[source]

Call self as a function.

## ResNet50¶

class sonnet.nets.ResNet50(num_classes: int, bn_config: Optional[Mapping[str, float]] = None, resnet_v2: bool = False, name: Optional[str] = None)[source]

ResNet50 module.

__init__(num_classes: int, bn_config: Optional[Mapping[str, float]] = None, resnet_v2: bool = False, name: Optional[str] = None)[source]

Constructs a ResNet model.

Parameters
• num_classes – The number of classes to classify the inputs into.

• bn_config – A dictionary of two elements, decay_rate and eps to be passed on to the BatchNorm layers.

• resnet_v2 – Whether to use the v1 or v2 ResNet implementation. Defaults to False.

• name – Name of the module.

## VectorQuantizer¶

class sonnet.nets.VectorQuantizer(embedding_dim: int, num_embeddings: int, commitment_cost: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: str = 'vector_quantizer')[source]

Sonnet module representing the VQ-VAE layer.

Implements the algorithm presented in ‘Neural Discrete Representation Learning’ by van den Oord et al. https://arxiv.org/abs/1711.00937

Input any tensor to be quantized. Last dimension will be used as space in which to quantize. All other dimensions will be flattened and will be seen as different examples to quantize.

The output tensor will have the same shape as the input.

For example a tensor with shape [16, 32, 32, 64] will be reshaped into [16384, 64] and all 16384 vectors (each of 64 dimensions) will be quantized independently.

embedding_dim

integer representing the dimensionality of the tensors in the quantized space. Inputs to the modules must be in this format as well.

num_embeddings

integer, the number of vectors in the quantized space.

commitment_cost

scalar which controls the weighting of the loss terms (see equation 4 in the paper - this variable is Beta).

__init__(embedding_dim: int, num_embeddings: int, commitment_cost: Union[float, numpy.floating, numpy.ndarray, tensorflow.python.framework.ops.Tensor, tensorflow.python.ops.variables.Variable], dtype: tensorflow.python.framework.dtypes.DType = tf.float32, name: str = 'vector_quantizer')[source]

Initializes a VQ-VAE module.

Parameters
• embedding_dim – dimensionality of the tensors in the quantized space. Inputs to the modules must be in this format as well.

• num_embeddings – number of vectors in the quantized space.

• commitment_cost – scalar which controls the weighting of the loss terms (see equation 4 in the paper - this variable is Beta).

• dtype – dtype for the embeddings variable, defaults to tf.float32.

• name – name of the module.

__call__(inputs, is_training)[source]

Connects the module to some inputs.

Parameters
• inputs – Tensor, final dimension must be equal to embedding_dim. All other leading dimensions will be flattened and treated as a large batch.

• is_training – boolean, whether this connection is to training data.

Returns

quantize: Tensor containing the quantized version of the input. loss: Tensor containing the loss to optimize. perplexity: Tensor containing the perplexity of the encodings. encodings: Tensor containing the discrete encodings, ie which element

of the quantized space each input element was mapped to.

encoding_indices: Tensor containing the discrete encoding indices, ie

which element of the quantized space each input element was mapped to.

Return type

dict containing the following keys and values

quantize(encoding_indices)[source]

Returns embedding tensor for a batch of indices.

## VectorQuantizerEMA¶

class sonnet.nets.VectorQuantizerEMA(embedding_dim, num_embeddings, commitment_cost, decay, epsilon=1e-05, dtype=tf.float32, name='vector_quantizer_ema')[source]

Sonnet module representing the VQ-VAE layer.

Implements a slightly modified version of the algorithm presented in ‘Neural Discrete Representation Learning’ by van den Oord et al. https://arxiv.org/abs/1711.00937

The difference between VectorQuantizerEMA and VectorQuantizer is that this module uses exponential moving averages to update the embedding vectors instead of an auxiliary loss. This has the advantage that the embedding updates are independent of the choice of optimizer (SGD, RMSProp, Adam, K-Fac, …) used for the encoder, decoder and other parts of the architecture. For most experiments the EMA version trains faster than the non-EMA version.

Input any tensor to be quantized. Last dimension will be used as space in which to quantize. All other dimensions will be flattened and will be seen as different examples to quantize.

The output tensor will have the same shape as the input.

For example a tensor with shape [16, 32, 32, 64] will be reshaped into [16384, 64] and all 16384 vectors (each of 64 dimensions) will be quantized independently.

embedding_dim

integer representing the dimensionality of the tensors in the quantized space. Inputs to the modules must be in this format as well.

num_embeddings

integer, the number of vectors in the quantized space.

commitment_cost

scalar which controls the weighting of the loss terms (see equation 4 in the paper).

decay

float, decay for the moving averages.

epsilon

small float constant to avoid numerical instability.

__init__(embedding_dim, num_embeddings, commitment_cost, decay, epsilon=1e-05, dtype=tf.float32, name='vector_quantizer_ema')[source]

Initializes a VQ-VAE EMA module.

Parameters
• embedding_dim – integer representing the dimensionality of the tensors in the quantized space. Inputs to the modules must be in this format as well.

• num_embeddings – integer, the number of vectors in the quantized space.

• commitment_cost – scalar which controls the weighting of the loss terms (see equation 4 in the paper - this variable is Beta).

• decay – float between 0 and 1, controls the speed of the Exponential Moving Averages.

• epsilon – small constant to aid numerical stability, default 1e-5.

• dtype – dtype for the embeddings variable, defaults to tf.float32.

• name – name of the module.

__call__(inputs, is_training)[source]

Connects the module to some inputs.

Parameters
• inputs – Tensor, final dimension must be equal to embedding_dim. All other leading dimensions will be flattened and treated as a large batch.

• is_training – boolean, whether this connection is to training data. When this is set to False, the internal moving average statistics will not be updated.

Returns

quantize: Tensor containing the quantized version of the input. loss: Tensor containing the loss to optimize. perplexity: Tensor containing the perplexity of the encodings. encodings: Tensor containing the discrete encodings, ie which element

of the quantized space each input element was mapped to.

encoding_indices: Tensor containing the discrete encoding indices, ie

which element of the quantized space each input element was mapped to.

Return type

dict containing the following keys and values

quantize(encoding_indices)[source]

Returns embedding tensor for a batch of indices.

# Mixed Precision¶

Sonnet mixed precision built for TensorFlow 2.

## modes¶

sonnet.mixed_precision.modes(valid_types)[source]

Decorate a function to cast inputs/outputs to different precision.

>>> snt.Linear.__call__ = snt.mixed_precision.modes(
...   [tf.float32, tf.float16])(snt.Linear.__call__)
>>> mod = snt.Linear(10)
>>> snt.mixed_precision.enable(tf.float16)
>>> y = mod(tf.ones([1, 1]))  # First call will be done in F32.
>>> y = mod(tf.ones([1, 1]))  # MatMul/Add will be done in F16.

Parameters

valid_types – Collection of types that the function being decorated is legal to run in.

Returns

A decorator that will cast the inputs and outputs of the decorated function according to the global mixed precision policy and the functions eligibility for mixed precision.

## enable¶

sonnet.mixed_precision.enable(dtype)[source]

Set the mixed precision mode.

Parameters

dtype – type to cast to.

## disable¶

sonnet.mixed_precision.disable()[source]

Disable mixed precision training.

## scope¶

sonnet.mixed_precision.scope`(dtype: tensorflow.python.framework.dtypes.DType)[source]

Temporarily set the global mixed precision type to dtype.

The global type is reset to its original value when the context is exited.

snt.mixed_precision.enable(tf.float32) snt.Linear.__call__ = snt.mixed_precision.modes(

[tf.float32, tf.float16])(snt.Linear.__call__)

mod = snt.Linear(10) with snt.mixed_precision.scope(tf.float16):

y = mod(tf.ones([1, 1])) # First call will be done in F32. y = mod(tf.ones([1, 1])) # MatMul/Add will be done in F16.

y = mod(tf.ones([1, 1])) # Outside the scope will be done in F32.

Parameters

dtype – type to set the mixed precision mode to.

Yields

Nothing. This is required for contextlib.contextmanager.

# References¶

1

Ashish Agarwal, David Berthelot, Tom Hennigan, Alex Passos, and Malcolm Reynolds. Stateful containers with tf.Module. TensorFlow Community RFCs, Google / DeepMind, 2019. URL: https://github.com/tensorflow/community/pull/56.

2

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014. URL: https://arxiv.org/abs/1409.2329.

3(1,2,3,4)

Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. An empirical exploration of recurrent network architectures. In International Conference on Machine Learning, 2342–2350. 2015.

4

Haşim Sak, Andrew Senior, and Françoise Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint arXiv:1402.1128, 2014. URL: https://arxiv.org/abs/1402.1128.

5

Yarin Gal and Zoubin Ghahramani. A theoretically grounded application of dropout in recurrent neural networks. In Advances in neural information processing systems, 1019–1027. 2016.

6(1,2,3)

SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: a machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, 802–810. 2015.

7

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. URL: https://arxiv.org/abs/1412.3555.

8

Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. 2014. arXiv:1412.6980.

9

Andrew M Saxe, James L McClelland, and Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013. URL: https://arxiv.org/abs/1312.6120.

10(1,2)

Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, and others. TF-Replicator: Distributed machine learning for researchers. arXiv preprint arXiv:1902.00465, 2019. URL: https://arxiv.org/abs/1902.00465.

11(1,2)

Peter Buchlovsky, Dominik Grewe, Priya Gupta, Tom Hennigan, Jonathan Hseu, Chris Jones, and Josh Levenberg. Distribution Strategy - Revised API. TensorFlow Community RFCs, Google / DeepMind, 2018. URL: https://github.com/tensorflow/community/pull/25.