AdamW¶
- class AdamW(params, lr, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01)[源代码]¶
Implements the AdamW algorithm proposed in “Decoupled Weight Decay Regularization”.
This optimizer combines the Adam optimizer with weight decay regularization to prevent overfitting.
- 参数
params (Union[Iterable[Parameter], dict]) – Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) – Learning rate.
betas (Tuple[float, float], optional) – Coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999).
eps (float, optional) – Term added to the denominator to improve numerical stability. Default: 1e-8.
weight_decay (float, optional) – Weight decay (L2 penalty). Default: 1e-2.
- 返回
An instance of the AdamW optimizer.