BatchNorm1d¶
- class BatchNorm1d(num_features, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True, freeze=False, **kwargs)[source]¶
Applies Batch Normalization over a 2D or 3D input.
\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the number of features or channels of the input). By default, the elements of \(\gamma\) are set to 1 and the elements of \(\beta\) are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).
By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default
momentum
of 0.9.If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.Because the Batch Normalization is done over the C dimension, computing statistics on (N, L) slices, it’s common terminology to call this Temporal Batch Normalization.
Note
The update formula for
running_mean
andrunning_var
(takingrunning_mean
as an example) is\[\textrm{running_mean} = \textrm{momentum} \times \textrm{running_mean} + (1 - \textrm{momentum}) \times \textrm{batch_mean}\]which could be defined differently in other frameworks. Most notably,
momentum
of 0.1 in PyTorch is equivalent tomementum
of 0.9 here.- Shape:
Input: \((N, C)\) or \((N, C, L)\), where \(N\) is the batch size, \(C\) is the number of features or channels, and \(L\) is the sequence length
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)