LayerNorm

class LayerNorm(normalized_shape, eps=1e-05, affine=True, **kwargs)[source]

Applies Layer Normalization over a mini-batch of inputs Refer to Layer Normalization

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by normalized_shape. \(\\gamma\) and \(\\beta\) are learnable affine transform parameters of normalized_shape if affine is True. The standard-deviation is calculated via the biased estimator.

Note

Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane, Layer Normalization applies per-element scale and bias.

Parameters
  • normalized_shape (int or tuple) – input shape from an expected input of size size \([*, normalized\_shape[0], normalized\_shape[1], ..., normalized\_shape[-1]]\). If it is a single integer, this module will normalize over the last dimension which is expected to be of that specific size.

  • eps – a value added to the denominator for numerical stability. Default: 1e-5

  • affine – this module has learnable affine parameters (weight, bias) when affine is set to be True.

Shape:
  • Input: \((N, *)\) (2-D, 3-D, 4-D or 5-D tensor)

  • Output: \((N, *)\) (same shape as input)

Examples

>>> import numpy as np
>>> inp = Tensor(np.arange(2 * 3 * 4 * 4).astype(np.float32).reshape(2, 3, 4, 4))
>>> m = M.LayerNorm((4, 4))
>>> out = m(inp)
>>> out.numpy().shape
(2, 3, 4, 4)