LAMB¶
- class LAMB(params, lr, betas=(0.9, 0.999), eps=1e-08, bias_correction=True, weight_decay=0.0, always_adapt=False)[source]¶
Implements LAMB algorithm.
LAMB is proposed in “Large Batch Optimization for Deep Learning: Training BERT in 76 minutes”.
- Parameters
params (
Union
[Iterable
[Parameter
],dict
]) – iterable of parameters to optimize or dicts defining parameter groups.lr (
float
) – learning rate.betas (
Tuple
[float
,float
]) – coefficients used for computing running averages of gradient and its square. Default:(0.9, 0.999)
eps (
float
) – term added to the denominator to improve numerical stability. Default:1e-8
bias_correction (
bool
) – enables bias correction by1 - beta ** step
. Default:True
weight_decay (
float
) – weight decay (L2 penalty). Default:0.0
always_adapt (
bool
) – apply adaptive lr to0.0
weight decay parameter. Default:False