Use Optimizer to optimize parameters#
MegEngine’s optimizer
module implements a large number of optimization algorithms, among which Optimizer
is the abstract base class of all optimizers and specifies the interfaces that must be provided. At the same time, users are provided with common optimizer implementations including: py:class:~.SGD, Adam
. These optimizers can update the parameters according to the strategy defined by the algorithm based on the gradient information of the parameters.
Taking the ``SGD’’ optimizer as an example, the basic process of optimizing the parameters of the neural network model is as follows:
from megengine.autodiff import GradManager
import megengine.optimizer as optim
model = MyModel()
gm = GradManager().attach(model.parameters())
optimizer = optim.SGD(model.parameters(), lr=0.01) # lr may vary with different model
for data, label in dataset:
with gm:
pred = model(data)
loss = loss_fn(pred, label)
gm.backward(loss)
optimizer.step().clear_grad()
We need to construct an optimizer and pass in the parameter ``Parameter’’ or its iteration that needs to be optimized;
By executing the
step
method, the parameters will be optimized once based on the gradient information;By executing the
clear_grad
method, the gradient of the parameter will be cleared.
Why do I need to clear the gradient manually?
When the gradient manager executes the backward
method, it will accumulate the current calculated gradient to the original gradient instead of directly replacing it. Therefore, for a new round of gradient calculation, it is usually necessary to clear the gradient information obtained in the previous round. When to clear the gradient is manually controlled, which allows flexible accumulation of gradients.
Optimizer state dictionary#
The Optimizer
constructor can also accept a dictionary containing the default parameters of the optimizer (such as learning rate, momentum, weight decay coefficient, etc.). This information can be passed: py:meth:~.Optimizer.state_dict And load_state_dict
to get and load.
The subclass can customize these parameters when implemented, likewise `` SGD`` Example:
>>> model = megengine.module.Linear(3, 2)
>>> optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9, weight_decay=1e-4)
>>> optimizer.state_dict()
{'param_groups': [{'lr': 0.001,
'momentum': 0.9,
'weight_decay': 0.0001,
'params': [0, 1]}],
'state': {0: {'momentum_buffer': array([0., 0.], dtype=float32)},
1: {'momentum_buffer': array([[0., 0., 0.],
[0., 0., 0.]], dtype=float32)}}}
Most Optimizer state dictionaries store statistical information about parameter gradients (such as runtime mean, contrast, etc.). This information needs to be saved/loaded when the model training is paused/resumed to ensure the consistency of the state before and after.
See also
Through: py:meth:~.Optimizer.load_state_dict we can load the ``Optimizer’’ state dictionary, which is often used to save and load the model training process.
There is also a state dictionary for saving and loading in
Module
, refer to Use Module to define the model structure.For the best practice of saving and loading during model training, please refer to Save and Load Models (S&L).