megengine.module.activation.
LeakyReLU
Bases: megengine.module.module.Module
megengine.module.module.Module
Applies the element-wise function:
or
Examples:
import numpy as np import megengine as mge import megengine.module as M data = mge.tensor(np.array([-8, -12, 6, 10]).astype(np.float32)) leakyrelu = M.LeakyReLU(0.01) output = leakyrelu(data) print(output.numpy())
Outputs:
[-0.08 -0.12 6. 10. ]
forward
PReLU
Here \(a\) is a learnable parameter. When called without arguments, PReLU() uses a single paramter \(a\) across all input channel. If called with PReLU(num_of_channels), each input channle will has it’s own \(a\).
num_parameters (int) – number of \(a\) to learn, there is only two values are legitimate: 1, or the number of channels at input. Default: 1
int
init (float) – the initial value of \(a\). Default: 0.25
float
import numpy as np import megengine as mge import megengine.module as M data = mge.tensor(np.array([-1.2, -3.7, 2.7]).astype(np.float32)) prelu = M.PReLU() output = prelu(data) print(output.numpy())
[-0.3 -0.925 2.7 ]
ReLU
import numpy as np import megengine as mge import megengine.module as M data = mge.tensor(np.array([-2,-1,0,1,2,]).astype(np.float32)) relu = M.ReLU() output = relu(data) with np.printoptions(precision=6): print(output.numpy())
[0. 0. 0. 1. 2.]
Sigmoid
import numpy as np import megengine as mge import megengine.module as M data = mge.tensor(np.array([-2,-1,0,1,2,]).astype(np.float32)) sigmoid = M.Sigmoid() output = sigmoid(data) with np.printoptions(precision=6): print(output.numpy())
[0.119203 0.268941 0.5 0.731059 0.880797]
Softmax
Applies a softmax function. Softmax is defined as:
It is applied to all elements along axis, and rescales elements so that they stay in the range [0, 1] and sum to 1.
axis – Along which axis softmax will be applied. By default, softmax will apply along the highest ranked axis.
import numpy as np import megengine as mge import megengine.module as M data = mge.tensor(np.array([-2,-1,0,1,2]).astype(np.float32)) softmax = M.Softmax() output = softmax(data) with np.printoptions(precision=6): print(output.numpy())
[0.011656 0.031685 0.086129 0.234122 0.636409]
megengine.module.adaptive_pooling.
AdaptiveAvgPool2d
Bases: megengine.module.adaptive_pooling._AdaptivePoolNd
megengine.module.adaptive_pooling._AdaptivePoolNd
Applies a 2D average pooling over an input.
For instance, given an input of the size \((N, C, H, W)\) and an output shape \((OH, OW)\), this layer generates the output of the size \((N, C, OH, OW)\) through a process described as:
kernel_size and stride can be inferred from input shape and out shape: * padding: (0, 0) * stride: (floor(IH / OH), floor(IW / OW)) * kernel_size: (IH - (OH - 1) * stride_h, IW - (OW - 1) * stride_w)
kernel_size
stride
import numpy as np import megengine as mge import megengine.module as M m = M.AdaptiveAvgPool2d((2, 2)) inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4)) oup = m(inp) print(oup.numpy())
[[[[ 2.5 4.5] [10.5 12.5]]]]
AdaptiveMaxPool2d
Applies a 2D max adaptive pooling over an input.
import numpy as np import megengine as mge import megengine.module as M m = M.AdaptiveMaxPool2d((2, 2)) inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4)) oup = m(inp) print(oup.numpy())
[[[[ 5. 7.] [13. 15.]]]]
megengine.module.batch_matmul_activation.
BatchMatMulActivation
Batched MatMul with activation(only relu supported), no transpose anywhere.
reset_parameters
None
megengine.module.batchnorm.
BatchNorm1d
Bases: megengine.module.batchnorm._BatchNorm
megengine.module.batchnorm._BatchNorm
Applies Batch Normalization over a 2D/3D tensor.
Refer to BatchNorm2d for more information.
BatchNorm2d
Applies Batch Normalization over a 4D tensor.
The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors.
By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.9.
momentum
If track_running_stats is set to False, this layer will not keep running estimates, batch statistics is used during evaluation time instead.
track_running_stats
False
Note
This momentum argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = \text{momentum} \times \hat{x} + (1 - \text{momentum}) \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.
Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.
num_features (int) – usually \(C\) from an input of shape \((N, C, H, W)\) or the highest ranked dimension of an input less than 4D.
eps (float) – a value added to the denominator for numerical stability. Default: 1e-5
momentum (float) – the value used for the running_mean and running_var computation. Default: 0.9
running_mean
running_var
affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats (bool) – when set to True, this module tracks the running mean and variance. When set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
freeze (bool) – when set to True, this module does not update the running mean and variance, and uses the running mean and variance instead of the batch mean and batch variance to normalize the input. The parameter takes effect only when the module is initilized with track_running_stats as True and the module is in training mode. Default: False
import numpy as np import megengine as mge import megengine.module as M # With Learnable Parameters m = M.BatchNorm2d(4) inp = mge.tensor(np.random.rand(1, 4, 3, 3).astype("float32")) oup = m(inp) print(m.weight.numpy().flatten(), m.bias.numpy().flatten()) # Without L`e`arnable Parameters m = M.BatchNorm2d(4, affine=False) oup = m(inp) print(m.weight, m.bias)
[1. 1. 1. 1.] [0. 0. 0. 0.] None None
SyncBatchNorm
Applies Synchronization Batch Normalization.
megengine.module.concat.
Concat
A Module to do functional concat. Could be replaced with QATModule version Concat using quantize_qat().
Module
QATModule
quantize_qat()
megengine.module.conv.
Conv1d
Bases: megengine.module.conv._ConvNd
megengine.module.conv._ConvNd
Applies a 1D convolution over an input tensor.
For instance, given an input of the size \((N, C_{\text{in}}, H)\), this layer generates an output of the size \((N, C_{\text{out}}, H_{\text{out}}})\) through the process described as below:
where \(\star\) is the valid 1D cross-correlation operator, \(N\) is batch size, \(C\) denotes number of channels, and \(H\) is length of 1D data element.
When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as depthwise convolution.
In other words, for an input of size \((N, C_{in}, H_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\).
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
kernel_size (int) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1
stride (int) – stride of the 1D convolution operation. Default: 1
padding (int) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0
dilation (int) – dilation of the 1D convolution operation. Default: 1
groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups, and there would be an extra dimension at the beginning of the weight’s shape. Specifically, the shape of weight would be (groups, out_channel // groups, in_channels // groups, *kernel_size).
groups
in_channels
out_channels
bias (bool) – whether to add a bias onto the result of convolution. Default: True
bool
conv_mode (str) – Supports CROSS_CORRELATION. Default: CROSS_CORRELATION
str
compute_mode (str) – When set to “DEFAULT”, no special requirements will be placed on the precision of intermediate results. When set to “FLOAT32”, “Float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype.
import numpy as np import megengine as mge import megengine.module as M m = M.Conv1d(in_channels=3, out_channels=1, kernel_size=3) inp = mge.tensor(np.arange(0, 24).astype("float32").reshape(2, 3, 4)) oup = m(inp) print(oup.numpy().shape)
(2, 1, 2)
calc_conv
Conv2d
Applies a 2D convolution over an input tensor.
For instance, given an input of the size \((N, C_{\text{in}}, H, W)\), this layer generates an output of the size \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) through the process described as below:
where \(\star\) is the valid 2D cross-correlation operator, \(N\) is batch size, \(C\) denotes number of channels, \(H\) is height of input planes in pixels, and \(W\) is width in pixels.
In general, output feature maps’ shapes can be inferred as follows:
input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\) output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) where
In other words, for an input of size \((N, C_{in}, H_{in}, W_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\).
kernel_size (Union[int, Tuple[int, int]]) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1
Union
Tuple
stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1
padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0
dilation (Union[int, Tuple[int, int]]) – dilation of the 2D convolution operation. Default: 1
import numpy as np import megengine as mge import megengine.module as M m = M.Conv2d(in_channels=3, out_channels=1, kernel_size=3) inp = mge.tensor(np.arange(0, 96).astype("float32").reshape(2, 3, 4, 4)) oup = m(inp) print(oup.numpy().shape)
(2, 1, 2, 2)
ConvRelu2d
Bases: megengine.module.conv.Conv2d
megengine.module.conv.Conv2d
A fused Module including Conv2d and relu. Could be replaced with QATModule version ConvRelu2d using quantize_qat().
ConvTranspose2d
Applies a 2D transposed convolution over an input tensor.
This module is also known as a deconvolution or a fractionally-strided convolution. ConvTranspose2d can be seen as the gradient of Conv2d operation with respect to its input.
Convolution usually reduces the size of input, while transposed convolution works the opposite way, transforming a smaller input to a larger output while preserving the connectivity pattern.
(kernel_size, kernel_size)
groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups, and there would be an extra dimension at the beginning of the weight’s shape. Specifically, the shape of weight would be (groups, out_channels // groups, in_channels // groups, *kernel_size). Default: 1
(groups, out_channels // groups, in_channels // groups, *kernel_size)
bias (bool) – wether to add a bias onto the result of convolution. Default: True
LocalConv2d
Applies a spatial convolution with untied kernels over an groupped channeled input 4D tensor. It is also known as the locally connected layer.
input_height (int) – the height of the input images.
input_width (int) – the width of the input images.
groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups. The shape of weight is (groups, output_height, output_width, in_channels // groups, *kernel_size, out_channels // groups).
megengine.module.conv_bn.
ConvBn2d
Bases: megengine.module.conv_bn._ConvBnActivation2d
megengine.module.conv_bn._ConvBnActivation2d
A fused Module including Conv2d, BatchNorm2d. Could be replaced with QATModule version ConvBn2d using quantize_qat().
ConvBnRelu2d
A fused Module including Conv2d, BatchNorm2d and relu. Could be replaced with QATModule version ConvBnRelu2d using quantize_qat().
megengine.module.dropout.
Dropout
Randomly sets input elements to zeros with the probability \(drop\_prob\) during training. Commonly used in large networks to prevent overfitting. Note that we perform dropout only during training, we also rescale(multiply) the output tensor by \(\frac{1}{1 - drop\_prob}\). During inference Dropout is equal to Identity.
Identity
drop_prob – The probability to drop (set to zero) each single element
megengine.module.elemwise.
Elemwise
A Module to do elemwise operator. Could be replaced with QATModule version Elemwise using quantize_qat().
method –
the elemwise method, support the following string. It will do the normal elemwise operator for float.
”ADD”: a + b
”FUSE_ADD_RELU”: max(x+y, 0)
”MUL”: x * y
”MIN”: min(x, y)
”MAX”: max(x, y)
”SUB”: x - y
”TRUE_DIV”: x / y
”FUSE_ADD_SIGMOID”: sigmoid(x + y)
”FUSE_ADD_TANH”: tanh(x + y)
”RELU”: x > 0 ? x : 0
”ABS”: x > 0 ? x : -x
”SIGMOID”: sigmoid(x)
”EXP”: exp(x)
”TANH”: tanh(x)
”FUSE_MUL_ADD3”: x * y + z
”FAST_TANH”: x * (27. + x * x) / (27. + 9. * x * x)
”NEGATE”: -x
”ACOS”: acos(x)
”ASIN”: asin(x)
”CEIL”: ceil(x)
”COS”: cos(x)
”EXPM1”: expm1(x)
”FLOOR”: floor(x)
”LOG”: log(x)
”LOG1P”: log1p(x)
”SIN”: sin(x)
”ROUND”: round(x)
”ERF”: erf(x)
”ERFINV”: erfinv(x)
”ERFC”: erfc(x)
”ERFCINV”: erfcinv(x)
”ABS_GRAD”: abs_grad
”FLOOR_DIV”: floor_div
”MOD”: mod
”SIGMOID_GRAD”: sigmoid_grad
”SWITCH_GT0”: switch_gt0
”TANH_GRAD”: tanh_grad
”LT”: less
”LEQ”: leq
”EQ”: equal
”POW”: pow
”LOG_SUM_EXP”: log_sum_exp
”FAST_TANH_GRAD”: fast_tanh_grad
”ATAN2”: atan2
”COND_LEQ_MOV”: cond_leq_mov
”H_SWISH”: h_swish
”FUSE_ADD_H_SWISH”: h_swish(x+y)
”H_SWISH_GRAD”: h_swish_grad
”AND”: bool binary: x && y
”OR”: bool binary: x || y
”XOR”: bool binary: x ^ y
”NOT”: bool unary: ~x
megengine.module.embedding.
Embedding
A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. The indices should less than num_embeddings.
num_embeddings (int) – size of embedding dictionary.
embedding_dim (int) – size of each embedding vector.
padding_idx (Optional[int]) – should be set to None, not supportted now.
Optional
max_norm (Optional[float]) – should be set to None, not supportted now.
norm_type (Optional[float]) – should be set to None, not supportted now.
initial_weight (Optional[Parameter]) – the learnable weights of the module of shape (num_embeddings, embedding_dim).
Parameter
import numpy as np import megengine as mge import megengine.module as M weight = mge.tensor(np.array([(1.2,2.3,3.4,4.5,5.6)], dtype=np.float32)) data = mge.tensor(np.array([(0,0)], dtype=np.int32)) embedding = M.Embedding(1, 5, initial_weight=weight) output = embedding(data) with np.printoptions(precision=6): print(output.numpy())
[[[1.2 2.3 3.4 4.5 5.6] [1.2 2.3 3.4 4.5 5.6]]]
from_pretrained
Creates Embedding instance from given 2-dimensional FloatTensor.
embeddings (Parameter) – tensor contained weight for the embedding.
freeze (Optional[bool]) – if True, the weight does not get updated during the learning process. Default: True.
True
padding_idx (Optional[int]) – should be set to None, not support Now.
max_norm (Optional[float]) – should be set to None, not support Now.
norm_type (Optional[float]) – should be set to None, not support Now.
import numpy as np import megengine as mge import megengine.module as M weight = mge.tensor(np.array([(1.2,2.3,3.4,4.5,5.6)], dtype=np.float32)) data = mge.tensor(np.array([(0,0)], dtype=np.int32)) embedding = M.Embedding.from_pretrained(weight, freeze=False) output = embedding(data) print(output.numpy())
megengine.module.identity.
A placeholder identity operator that will ignore any argument.
megengine.module.init.
calculate_correct_fan
Calculates fan_in / fan_out value for given weight tensor, depending on given mode.
mode
See calculate_fan_in_and_fan_out() for details.
calculate_fan_in_and_fan_out()
tensor (Tensor) – weight tensor in NCHW format.
Tensor
NCHW
mode (str) – “fan_in” or “fan_out”.
calculate_fan_in_and_fan_out
Calculates fan_in / fan_out value for given weight tensor. This function assumes input tensor is stored in NCHW format.
Tuple[float, float]
calculate_gain
Returns a recommended gain value (see the table below) for the given nonlinearity function.
nonlinearity
gain
Linear / Identity
\(1\)
Conv{1,2,3}D
Tanh
\(\frac{5}{3}\)
\(\sqrt{2}\)
Leaky Relu
\(\sqrt{\frac{2}{1 + {\text{negative}_\text{slope}}^2}}\)
nonlinearity (str) – name of the non-linear function.
param (Union[int, float, None]) – optional parameter for leaky_relu. Only effective when nonlinearity is “leaky_relu”.
fill_
Fills the given tensor with value val.
tensor
val
tensor (Tensor) – tensor to be initialized.
val (Union[float, int]) – value to be filled throughout the tensor.
msra_normal_
Fills tensor wilth random values sampled from \(\mathcal{N}(0, \text{std}^2)\) where
Detailed information can be retrieved from Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification
tensor (Tensor) – tensor to be initialized
a (float) – optional parameter for calculating gain for leaky_relu. See calculate_gain() for details.
calculate_gain()
mode (str) – “fan_in” or “fan_out”, used to calculate \(gain\), the scaling factor for \(gain\). See calculate_fan_in_and_fan_out() for details.
nonlinearity (str) – name of the non-linear function used to calculate \(gain\). See calculate_gain() for details.
msra_uniform_
Fills tensor wilth random values sampled from \(\mathcal{U}(-\text{bound}, \text{bound})\) where
mode (str) – “fan_in” or “fan_out”, used to calculate \(gain\), the scaling factor for \(bound\). See calculate_fan_in_and_fan_out() for details.
normal_
Fills the given tensor with random value sampled from normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).
mean (float) – mean of the normal distribution.
std (float) – standard deviation of the normal distribution.
ones_
Fills the given tensor with the scalar value 1.
uniform_
Fills the given tensor with random value sampled from uniform distribution \(\mathcal{U}(\text{a}, \text{b})\).
a (float) – lower bound of the sampling interval.
b (float) – upper bound of the sampling interval.
xavier_normal_
Fills tensor with random values sampled from \(\mathcal{N}(0, \text{std}^2)\) where
Also known as Glorot initialization. Detailed information can be retrieved from Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).
gain (float) – scaling factor for \(std\).
xavier_uniform_
Fills tensor with random values sampled from \(\mathcal{U}(-a, a)\) where
gain (float) – scaling factor for \(a\).
zeros_
Fills the given tensor with scalar value 0.
megengine.module.linear.
Linear
Applies a linear transformation to the input. For instance, if input is x, then output y is:
where \(y_i= \sum_j W_{ij} x_j + b_i\)
in_features (int) – size of each input sample.
out_features (int) – size of each output sample.
bias (bool) – if it’s False, the layer will not learn an additional bias. Default: True
bias
import numpy as np import megengine as mge import megengine.module as M m = M.Linear(in_features=3, out_features=1) inp = mge.tensor(np.arange(0, 6).astype("float32").reshape(2, 3)) oup = m(inp) print(oup.numpy().shape)
(2, 1)
megengine.module.module.
Bases: object
object
Base Module class.
apply
Applies function fn to all the modules within this module, including itself.
fn
fn (Callable[[Module], Any]) – the function to be applied on modules.
Callable
Any
buffers
Returns an iterable for the buffers of the module.
Buffer is defined to be Tensor excluding Parameter.
recursive (bool) – if True, returns all buffers within this module, else only returns buffers that are direct attributes of this module.
Iterable[Tensor]
Iterable
children
Returns an iterable for all the submodules that are direct attributes of this module.
Iterable[Module]
disable_quantize
Sets module’s quantize_disabled attribute and return module. Could be used as a decorator.
module
quantize_disabled
eval
Sets training mode of all the modules within this module (including itself) to False. See train() for details.
train()
load_state_dict
Loads a given dictionary created by state_dict() into this module. If strict is True, the keys of state_dict() must exactly match the keys returned by state_dict().
state_dict()
strict
Users can also pass a closure: Function[key: str, var: Tensor] -> Optional[np.ndarray] as a state_dict, in order to handle complex situations. For example, load everything except for the final linear classifier:
Function[key: str, var: Tensor] -> Optional[np.ndarray]
state_dict = {...} # Dict[str, np.ndarray] model.load_state_dict({ k: None if k.startswith('fc') else v for k, v in state_dict.items() }, strict=False)
Here returning None means skipping parameter k.
k
To prevent shape mismatch (e.g. load PyTorch weights), we can reshape before loading:
state_dict = {...} def reshape_accordingly(k, v): return state_dict[k].reshape(v.shape) model.load_state_dict(reshape_accordingly)
We can also perform inplace re-initialization or pruning:
def reinit_and_pruning(k, v): if 'bias' in k: M.init.zero_(v) if 'conv' in k: return v.numpy() * (np.abs(v.numpy()) > 1e-3).astype("float32) model.load_state_dict(reinit_and_pruning, strict=False)
modules
Returns an iterable for all the modules within this module, including itself.
named_buffers
Returns an iterable for key buffer pairs of the module, where key is the dotted path from this module to the buffer.
key
prefix (Optional[str]) – prefix prepended to the keys.
Iterable[Tuple[str, Tensor]]
named_children
Returns an iterable of key-submodule pairs for all the submodules that are direct attributes of this module, where ‘key’ is the attribute name of submodules.
Iterable[Tuple[str, Module]]
named_modules
Returns an iterable of key-module pairs for all the modules within this module, including itself, where ‘key’ is the dotted path from this module to the submodules.
prefix (Optional[str]) – prefix prepended to the path.
named_parameters
Returns an iterable for key Parameter pairs of the module, where key is the dotted path from this module to the Parameter.
recursive (bool) – if True, returns all Parameter within this module, else only returns Parameter that are direct attributes of this module.
Iterable[Tuple[str, Parameter]]
parameters
Returns an iterable for the Parameter of the module.
recursive (bool) – If True, returns all Parameter within this module, else only returns Parameter that are direct attributes of this module.
Iterable[Parameter]
register_forward_hook
Registers a hook to handle forward results. hook should be a function that receive module, inputs and outputs, then return a modified outputs or None.
This method return a handler with remove() interface to delete the hook.
remove()
HookHandler
register_forward_pre_hook
Registers a hook to handle forward inputs. hook should be a function.
hook (Callable) – a function that receive module and inputs, then return
a modified inputs or None. :rtype: HookHandler :return: a handler with remove() interface to delete the hook.
replace_param
Replaces module’s parameters with params, used by ParamPack to speedup multimachine training.
params
ParamPack
Deprecated since version 1.0.
state_dict
train
Sets training mode of all the modules within this module (including itself) to mode. This effectively sets the training attributes of those modules to mode, but only has effect on certain modules (e.g. BatchNorm2d, Dropout, Observer)
training
Observer
mode (bool) – the training mode to be set on modules.
recursive (bool) – whether to recursively call submodules’ train().
zero_grad
Sets all parameters’ grads to zero
megengine.module.normalization.
GroupNorm
Simple implementation of GroupNorm. Only support 4d tensor now. Reference: https://arxiv.org/pdf/1803.08494.pdf.
InstanceNorm
Simple implementation of InstanceNorm. Only support 4d tensor now. Reference: https://arxiv.org/abs/1607.08022. Note that InstanceNorm equals using GroupNome with num_groups=num_channels.
LayerNorm
Simple implementation of LayerNorm. Only support 4d tensor now. Reference: https://arxiv.org/pdf/1803.08494.pdf. Note that LayerNorm equals using GroupNorm with num_groups=1.
megengine.module.pooling.
AvgPool2d
Bases: megengine.module.pooling._PoolNd
megengine.module.pooling._PoolNd
For instance, given an input of the size \((N, C, H, W)\) and kernel_size \((kH, kW)\), this layer generates the output of the size \((N, C, H_{out}, W_{out})\) through a process described as:
If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.
padding
kernel_size (Union[int, Tuple[int, int]]) – the size of the window.
stride (Union[int, Tuple[int, int], None]) – the stride of the window. Default value is kernel_size。
padding (Union[int, Tuple[int, int]]) – implicit zero padding to be added on both sides.
import numpy as np import megengine as mge import megengine.module as M m = M.AvgPool2d(kernel_size=3, stride=1, padding=0) inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4)) oup = m(inp) print(oup.numpy())
[[[[ 5. 6.] [ 9. 10.]]]]
MaxPool2d
Applies a 2D max pooling over an input.
kernel_size (Union[int, Tuple[int, int]]) – the size of the window to take a max over.
stride (Union[int, Tuple[int, int], None]) – the stride of the window. Default value is kernel_size.
import numpy as np import megengine as mge import megengine.module as M m = M.MaxPool2d(kernel_size=3, stride=1, padding=0) inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4)) oup = m(inp) print(oup.numpy())
[[[[10. 11.] [14. 15.]]]]
megengine.module.quant_dequant.
DequantStub
A helper Module simply returning input. Could be replaced with QATModule version DequantStub using quantize_qat().
QuantStub
A helper Module simply returning input. Could be replaced with QATModule version QuantStub using quantize_qat().
megengine.module.sequential.
Sequential
A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.
To make it easier to understand, here is a small example:
import numpy as np import megengine as mge import megengine.module as M import megengine.functional as F from collections import OrderedDict batch_size = 64 data = mge.tensor(np.zeros((batch_size, 28 * 28)), dtype=np.float32) label = mge.tensor(np.zeros(batch_size,), dtype=np.int32) net0 = M.Sequential( M.Linear(28 * 28, 320), M.Linear(320, 10) ) pred0 = net0(data) modules = OrderedDict() modules["fc0"] = M.Linear(28 * 28, 320) modules["fc1"] = M.Linear(320, 10) net1 = M.Sequential(modules) pred1 = net1(data)
layer_values