megengine.optimizer.adadelta.
Adadelta
Bases: megengine.optimizer.optimizer.Optimizer
megengine.optimizer.optimizer.Optimizer
Implements Adadelta algorithm.
It has been proposed in “ADADELTA: An Adaptive Learning Rate Method”.
params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.
Union
Iterable
Parameter
dict
lr (float) – coefficient that scales delta before it is applied to the parameters. Default: 1.0
float
rho (float) – coefficient used for computing a running average of squared gradients. Default: 0.9
eps (float) – term added to the denominator to improve numerical stability. Default: 1e-6
weight_decay (float) – weight decay (L2 penalty). Default: 0
megengine.optimizer.adagrad.
Adagrad
Implements Adagrad algorithm.
It has been proposed in “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”.
lr (float) – coefficient that scales delta before it is applied to the parameters. Default: 1e-2
lr_decay (float) – learning rate decay. Default: 0
eps (float) – term added to the denominator to improve numerical stability. Default: 1e-10
megengine.optimizer.adam.
Adam
Implements Adam algorithm proposed in “Adam: A Method for Stochastic Optimization”.
lr (float) – learning rate.
betas (Tuple[float, float]) – coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999)
Tuple
eps (float) – term added to the denominator to improve numerical stability Default: 1e-8
megengine.optimizer.lr_scheduler.
LRScheduler
Bases: object
object
Base class for all learning rate based schedulers.
optimizer (Optimizer) – wrapped optimizer.
Optimizer
current_epoch (int) – the index of current epoch. Default: -1
int
get_lr
Compute current learning rate for the scheduler.
load_state_dict
Loads the schedulers state.
state_dict (dict) – scheduler state.
state_dict
It contains an entry for every variable in self.__dict__ which is not the optimizer.
step
megengine.optimizer.multi_step_lr.
MultiStepLR
Bases: megengine.optimizer.lr_scheduler.LRScheduler
megengine.optimizer.lr_scheduler.LRScheduler
number of epoch reaches one of the milestones.
milestones (Iterable[int]) – list of epoch indices which should be increasing.
gamma (float) – multiplicative factor of learning rate decay. Default: 0.1
megengine.optimizer.optimizer.
Base class for all optimizers.
params (Union[Iterable[Parameter], dict]) – specifies what Tensors should be optimized.
defaults (dict) – a dict of default parameters of Optimizer, like learning rate or momentum.
add_param_group
Add a param group to param_groups of the Optimizer.
param_groups
This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.
param_group (dict) – specifies what tensors should be optimized along with group.
backward
bcast_param
clear_grad
Set the grad attribute to None for all parameters.
Loads the optimizer state.
state (dict) – optimizer state. Should be an object returned from a call to state_dict().
state_dict()
Export the optimizer state.
Dict
optimizer state. Can be loaded by load_state_dict().
load_state_dict()
Performs a single optimization step.
zero_grad
Deprecated since version 1.0.
use clear_grad instead
megengine.optimizer.sgd.
SGD
Implements stochastic gradient descent.
Nesterov momentum is based on the formula from “On the importance of initialization and momentum in deep learning” .
momentum (float) – momentum factor. Default: 0.0
weight_decay (float) – weight decay (L2 penalty). Default: 0.0