megengine.optimizer package

megengine.optimizer.adadelta

class megengine.optimizer.adadelta.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0.0)[source]

Bases: megengine.optimizer.optimizer.Optimizer

Implements Adadelta algorithm.

It has been proposed in “ADADELTA: An Adaptive Learning Rate Method”.

Parameters
  • params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – coefficient that scales delta before it is applied to the parameters. Default: 1.0

  • rho (float) – coefficient used for computing a running average of squared gradients. Default: 0.9

  • eps (float) – term added to the denominator to improve numerical stability. Default: 1e-6

  • weight_decay (float) – weight decay (L2 penalty). Default: 0

megengine.optimizer.adagrad

class megengine.optimizer.adagrad.Adagrad(params, lr=0.01, lr_decay=0.0, eps=1e-10, weight_decay=0.0)[source]

Bases: megengine.optimizer.optimizer.Optimizer

Implements Adagrad algorithm.

It has been proposed in “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”.

Parameters
  • params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – coefficient that scales delta before it is applied to the parameters. Default: 1e-2

  • lr_decay (float) – learning rate decay. Default: 0

  • eps (float) – term added to the denominator to improve numerical stability. Default: 1e-10

  • weight_decay (float) – weight decay (L2 penalty). Default: 0

megengine.optimizer.adam

class megengine.optimizer.adam.Adam(params, lr, betas=0.9, 0.999, eps=1e-08, weight_decay=0.0)[source]

Bases: megengine.optimizer.optimizer.Optimizer

Implements Adam algorithm proposed in “Adam: A Method for Stochastic Optimization”.

Parameters
  • params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – learning rate.

  • betas (Tuple[float, float]) – coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999)

  • eps (float) – term added to the denominator to improve numerical stability Default: 1e-8

  • weight_decay (float) – weight decay (L2 penalty). Default: 0

megengine.optimizer.lr_scheduler

class megengine.optimizer.lr_scheduler.LRScheduler(optimizer, current_epoch=- 1)[source]

Bases: object

Base class for all learning rate based schedulers.

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • current_epoch (int) – the index of current epoch. Default: -1

get_lr()[source]

Compute current learning rate for the scheduler.

load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state.

state_dict()[source]
Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

step(epoch=None)[source]

megengine.optimizer.multi_step_lr

class megengine.optimizer.multi_step_lr.MultiStepLR(optimizer, milestones, gamma=0.1, current_epoch=- 1)[source]

Bases: megengine.optimizer.lr_scheduler.LRScheduler

Decays the learning rate of each parameter group by gamma once the

number of epoch reaches one of the milestones.

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • milestones (Iterable[int]) – list of epoch indices which should be increasing.

  • gamma (float) – multiplicative factor of learning rate decay. Default: 0.1

  • current_epoch (int) – the index of current epoch. Default: -1

get_lr()[source]

Compute current learning rate for the scheduler.

load_state_dict(state_dict)[source]

Loads the schedulers state.

Parameters

state_dict (dict) – scheduler state.

state_dict()[source]
Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

megengine.optimizer.optimizer

class megengine.optimizer.optimizer.Optimizer(params, defaults)[source]

Bases: object

Base class for all optimizers.

Parameters
  • params (Union[Iterable[Parameter], dict]) – specifies what Tensors should be optimized.

  • defaults (dict) – a dict of default parameters of Optimizer, like learning rate or momentum.

add_param_group(param_group)[source]

Add a param group to param_groups of the Optimizer.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Parameters

param_group (dict) – specifies what tensors should be optimized along with group.

backward(loss)[source]
bcast_param()[source]
clear_grad()[source]

Set the grad attribute to None for all parameters.

load_state_dict(state)[source]

Loads the optimizer state.

Parameters

state (dict) – optimizer state. Should be an object returned from a call to state_dict().

state_dict()[source]

Export the optimizer state.

Return type

Dict

Returns

optimizer state. Can be loaded by load_state_dict().

step()[source]

Performs a single optimization step.

zero_grad()[source]

Deprecated since version 1.0.

use clear_grad instead

megengine.optimizer.sgd

class megengine.optimizer.sgd.SGD(params, lr, momentum=0.0, weight_decay=0.0)[source]

Bases: megengine.optimizer.optimizer.Optimizer

Implements stochastic gradient descent.

Nesterov momentum is based on the formula from “On the importance of initialization and momentum in deep learning” .

Parameters
  • params (Union[Iterable[Parameter], dict]) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – learning rate.

  • momentum (float) – momentum factor. Default: 0.0

  • weight_decay (float) – weight decay (L2 penalty). Default: 0.0