LayerNorm¶
- class LayerNorm(normalized_shape, eps=1e-05, affine=True, **kwargs)[source]¶
Applies Layer Normalization over a mini-batch of inputs Refer to Layer Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by
normalized_shape
. \(\\gamma\) and \(\\beta\) are learnable affine transform parameters ofnormalized_shape
ifaffine
isTrue
. The standard-deviation is calculated via the biased estimator.Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane, Layer Normalization applies per-element scale and bias.
- Parameters
normalized_shape (int or tuple) – input shape from an expected input of size size \([*, normalized\_shape[0], normalized\_shape[1], ..., normalized\_shape[-1]]\). If it is a single integer, this module will normalize over the last dimension which is expected to be of that specific size.
eps – a value added to the denominator for numerical stability. Default: 1e-5
affine – this module has learnable affine parameters (weight, bias) when affine is set to be True.
- Shape:
Input: \((N, *)\) (2-D, 3-D, 4-D or 5-D tensor)
Output: \((N, *)\) (same shape as input)
Examples
>>> import numpy as np >>> inp = Tensor(np.arange(2 * 3 * 4 * 4).astype(np.float32).reshape(2, 3, 4, 4)) >>> m = M.LayerNorm((4, 4)) >>> out = m(inp) >>> out.numpy().shape (2, 3, 4, 4)