Use Optimizer to optimize parameters#

MegEngine’s optimizer module implements a large number of optimization algorithms, among which Optimizer is the abstract base class of all optimizers and specifies the interfaces that must be provided. At the same time, users are provided with common optimizer implementations including: py:class:~.SGD, Adam. These optimizers can update the parameters according to the strategy defined by the algorithm based on the gradient information of the parameters.

Taking the ``SGD’’ optimizer as an example, the basic process of optimizing the parameters of the neural network model is as follows：

from megengine.autodiff import GradManager
import megengine.optimizer as optim

model = MyModel()
gm = GradManager().attach(model.parameters())
optimizer = optim.SGD(model.parameters(), lr=0.01)  # lr may vary with different model

for data, label in dataset:
    with gm:
        pred = model(data)
        loss = loss_fn(pred, label)
        gm.backward(loss)
        optimizer.step().clear_grad()

We need to construct an optimizer and pass in the parameter ``Parameter’’ or its iteration that needs to be optimized;
By executing the step method, the parameters will be optimized once based on the gradient information;
By executing the clear_grad method, the gradient of the parameter will be cleared.

Why do I need to clear the gradient manually?

When the gradient manager executes the backward method, it will accumulate the current calculated gradient to the original gradient instead of directly replacing it. Therefore, for a new round of gradient calculation, it is usually necessary to clear the gradient information obtained in the previous round. When to clear the gradient is manually controlled, which allows flexible accumulation of gradients.

Optimizer state dictionary#

The Optimizer constructor can also accept a dictionary containing the default parameters of the optimizer (such as learning rate, momentum, weight decay coefficient, etc.). This information can be passed: py:meth:~.Optimizer.state_dict And load_state_dict to get and load.

The subclass can customize these parameters when implemented, likewise `` SGD`` Example：

>>> model = megengine.module.Linear(3, 2)
>>> optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9, weight_decay=1e-4)
>>> optimizer.state_dict()
{'param_groups': [{'lr': 0.001,
   'momentum': 0.9,
   'weight_decay': 0.0001,
   'params': [0, 1]}],
 'state': {0: {'momentum_buffer': array([0., 0.], dtype=float32)},
  1: {'momentum_buffer': array([[0., 0., 0.],
          [0., 0., 0.]], dtype=float32)}}}

Most Optimizer state dictionaries store statistical information about parameter gradients (such as runtime mean, contrast, etc.). This information needs to be saved/loaded when the model training is paused/resumed to ensure the consistency of the state before and after.

learn more#

Parameter optimization advanced configuration