Use Module to define the model structure#
The neural network model is composed of various layers, or modules, that perform operations on input data.
The above picture shows the classic AlexNet model structure (picture source ), which includes the classic convolutional layer conv
and the fully connected layer fc
module…
An abstraction of this structure is provided in the :mod:
Common neural network module interfaces such as: py:class:~.module.Conv2d are implemented in this namespace, which is convenient for users to quickly build model structures;
All modules are
Module
, please refer to Module base class concept and interface introduction;In addition, a
Sequential
sequence container is provided, which is helpful when defining complex structures.
Warning
The capitalized Module'' in MegEngine refers to the base class that is frequently used by us when designing the model structure. It needs to be distinguished from the concept of lowercase ``module'' in Python. The latter refers to Files that can be imported. The statement ``import megengine.module as M'' actually imports the file module named ``module.py
(usually abbreviated as M
).
See also
This chapter mainly introduces the Float32 type Module
used by default and the parameter initialization init
module. The QAT Module
and Quantized Module
used in the quantization model will be introduced in the Quantization.
Basic usage example#
The following code demonstrates how to use the basic components of `` Module`` quickly design a convolution neural network structure:
All network structures are derived from the base class
M.Module
. In the constructor, you must first callsuper().__init__()
.In the constructor, declare all layers/modules to be used;
In the
forward
function, define how the model will run, from input to output.
import megengine.functional as F
import megengine.module as M
class ConvNet(M.Module):
def __init__(self):
# this is the place where you instantiate all your modules
# you can later access them using the same names you've given them in
# here
super().__init__()
self.conv1 = M.Conv2d(1, 10, 5)
self.pool1 = M.MaxPool2d(2, 2)
self.conv2 = M.Conv2d(10, 20, 5)
self.pool2 = M.MaxPool2d(2, 2)
self.fc1 = M.Linear(320, 50)
self.fc2 = M.Linear(50, 10)
# it's the forward function that defines the network structure
# we're accepting only a single input in here, but if you want,
# feel free to use more
def forward(self, input):
x = self.pool1(F.relu(self.conv1(input)))
x = self.pool2(F.relu(self.conv2(x)))
x = F.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return x
Pay attention to the following points:
ConvNet
is also a moduleModule
, which is the same asConv2d
,Linear
, which means it can be used as a substructure of other modules. This flexible nesting mechanism between ``Module’’ allows users to design very complex model structures in a relatively simple way.In the process of defining the model, any Python code can be used to organize the model structure. The conditions and loop control flow statements are completely legal and can be handled well by the automatic differentiation mechanism. You can even create a loop in the fronthaul process, where the same modules are reused.
Let’s create an example and try it out:
>>> net = ConvNet()
>>> net
ConvNet(
(conv1): Conv2d(1, 10, kernel_size=(5, 5))
(pool1): MaxPool2d(kernel_size=2, stride=2, padding=0)
(conv2): Conv2d(10, 20, kernel_size=(5, 5))
(pool2): MaxPool2d(kernel_size=2, stride=2, padding=0)
(fc1): Linear(in_features=320, out_features=50, bias=True)
(fc2): Linear(in_features=50, out_features=10, bias=True)
)
Note
All ``Modules’’ only support small batches of samples as input, not single samples.
For example, Conv2d
is input as a 4-dimensional Tensor of nSamples x nChannels x Height x Width
.
If you have a single sample, you need to use expand_dims
to add a latitude.
We create a small batch of data containing a single sample (ie batch_size=1
) and send it to ConvNet
:
>>> input = megengine.Tensor(np.random.randn(1, 1, 28, 28))
>>> out = net(input)
>>> out.shape
(1, 10)
The output of ``ConvNet’’ is a Tensor, we can use it and the target label to calculate the loss, and then use the automatic derivation to complete the back propagation process. However, by default, all Tensors do not need to be differentiated, so before that we need to have a gradient manager to bind the ``Module’’ parameters and record the gradient information during the forward calculation. To understand this process, please refer to Basic principles and use of Autodiff.
More usage scenarios#
See also
The Module
interface provides many useful attributes and methods, which can be conveniently used in different situations, such as:
Use
.parameters()
to easily obtain the iterator of parameters, which can be used to track gradients to facilitate automatic derivation;Each
Module
has its own namename
, and the name and correspondingModule'' can be obtained through ``.named_module()
;Use
.state_dict()
and.load_state_dict()
to get and load state information…
For more information, please refer to Module base class concept and interface introduction.