megengine.module package

megengine.module.activation

class megengine.module.activation.LeakyReLU(negative_slope=0.01)[source]

Bases: megengine.module.module.Module

Applies the element-wise function:

\[\text{LeakyReLU}(x) = \max(0,x) + negative\_slope \times \min(0,x)\]

or

\[\begin{split}\text{LeakyReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ negative\_slope \times x, & \text{ otherwise } \end{cases}\end{split}\]

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
data = mge.tensor(np.array([-8, -12, 6, 10]).astype(np.float32))

leakyrelu = M.LeakyReLU(0.01)
output = leakyrelu(data)
print(output.numpy())

Outputs:

[-0.08 -0.12  6.   10.  ]
forward(inputs)[source]
class megengine.module.activation.PReLU(num_parameters=1, init=0.25)[source]

Bases: megengine.module.module.Module

Applies the element-wise function:

\[\text{PReLU}(x) = \max(0,x) + a * \min(0,x)\]

or

\[\begin{split}\text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ ax, & \text{ otherwise } \end{cases}\end{split}\]

Here \(a\) is a learnable parameter. When called without arguments, PReLU() uses a single paramter \(a\) across all input channel. If called with PReLU(num_of_channels), each input channle will has it’s own \(a\).

Parameters
  • num_parameters (int) – number of \(a\) to learn, there is only two values are legitimate: 1, or the number of channels at input. Default: 1

  • init (float) – the initial value of \(a\). Default: 0.25

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
data = mge.tensor(np.array([-1.2, -3.7, 2.7]).astype(np.float32))
prelu = M.PReLU()
output = prelu(data)
print(output.numpy())

Outputs:

[-0.3   -0.925  2.7  ]
forward(inputs)[source]
class megengine.module.activation.ReLU[source]

Bases: megengine.module.module.Module

Applies the element-wise function:

\[\text{ReLU}(x) = \max(x, 0)\]

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
data = mge.tensor(np.array([-2,-1,0,1,2,]).astype(np.float32))
relu = M.ReLU()
output = relu(data)
with np.printoptions(precision=6):
    print(output.numpy())

Outputs:

[0. 0. 0. 1. 2.]
forward(x)[source]
class megengine.module.activation.Sigmoid[source]

Bases: megengine.module.module.Module

Applies the element-wise function:

\[\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}\]

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

data = mge.tensor(np.array([-2,-1,0,1,2,]).astype(np.float32))
sigmoid = M.Sigmoid()
output = sigmoid(data)
with np.printoptions(precision=6):
    print(output.numpy())

Outputs:

[0.119203 0.268941 0.5      0.731059 0.880797]
forward(inputs)[source]
class megengine.module.activation.Softmax(axis=None)[source]

Bases: megengine.module.module.Module

Applies a softmax function. Softmax is defined as:

\[\text{Softmax}(x_{i}) = \frac{exp(x_i)}{\sum_j exp(x_j)}\]

It is applied to all elements along axis, and rescales elements so that they stay in the range [0, 1] and sum to 1.

Parameters

axis – Along which axis softmax will be applied. By default, softmax will apply along the highest ranked axis.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

data = mge.tensor(np.array([-2,-1,0,1,2]).astype(np.float32))
softmax = M.Softmax()
output = softmax(data)
with np.printoptions(precision=6):
    print(output.numpy())

Outputs:

[0.011656 0.031685 0.086129 0.234122 0.636409]
forward(inputs)[source]

megengine.module.adaptive_pooling

class megengine.module.adaptive_pooling.AdaptiveAvgPool2d(oshp)[source]

Bases: megengine.module.adaptive_pooling._AdaptivePoolNd

Applies a 2D average pooling over an input.

For instance, given an input of the size \((N, C, H, W)\) and an output shape \((OH, OW)\), this layer generates the output of the size \((N, C, OH, OW)\) through a process described as:

\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]

kernel_size and stride can be inferred from input shape and out shape: * padding: (0, 0) * stride: (floor(IH / OH), floor(IW / OW)) * kernel_size: (IH - (OH - 1) * stride_h, IW - (OW - 1) * stride_w)

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.AdaptiveAvgPool2d((2, 2))
inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4))
oup = m(inp)
print(oup.numpy())

Outputs:

[[[[ 2.5  4.5]
   [10.5 12.5]]]]
forward(inp)[source]
class megengine.module.adaptive_pooling.AdaptiveMaxPool2d(oshp)[source]

Bases: megengine.module.adaptive_pooling._AdaptivePoolNd

Applies a 2D max adaptive pooling over an input.

For instance, given an input of the size \((N, C, H, W)\) and an output shape \((OH, OW)\), this layer generates the output of the size \((N, C, OH, OW)\) through a process described as:

\[\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned}\]

kernel_size and stride can be inferred from input shape and out shape: * padding: (0, 0) * stride: (floor(IH / OH), floor(IW / OW)) * kernel_size: (IH - (OH - 1) * stride_h, IW - (OW - 1) * stride_w)

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.AdaptiveMaxPool2d((2, 2))
inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4))
oup = m(inp)
print(oup.numpy())

Outputs:

[[[[ 5.  7.]
   [13. 15.]]]]
forward(inp)[source]

megengine.module.batch_matmul_activation

class megengine.module.batch_matmul_activation.BatchMatMulActivation(batch, in_features, out_features, bias=True, nonlinear_mode='IDENTITY', **kwargs)[source]

Bases: megengine.module.module.Module

Batched MatMul with activation(only relu supported), no transpose anywhere.

forward(x)[source]
reset_parameters()[source]
Return type

None

megengine.module.batchnorm

class megengine.module.batchnorm.BatchNorm1d(num_features, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True, freeze=False)[source]

Bases: megengine.module.batchnorm._BatchNorm

Applies Batch Normalization over a 2D/3D tensor.

Refer to BatchNorm2d for more information.

class megengine.module.batchnorm.BatchNorm2d(num_features, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True, freeze=False)[source]

Bases: megengine.module.batchnorm._BatchNorm

Applies Batch Normalization over a 4D tensor.

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors.

By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.9.

If track_running_stats is set to False, this layer will not keep running estimates, batch statistics is used during evaluation time instead.

Note

This momentum argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = \text{momentum} \times \hat{x} + (1 - \text{momentum}) \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.

Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization.

Parameters
  • num_features (int) – usually \(C\) from an input of shape \((N, C, H, W)\) or the highest ranked dimension of an input less than 4D.

  • eps (float) – a value added to the denominator for numerical stability. Default: 1e-5

  • momentum (float) – the value used for the running_mean and running_var computation. Default: 0.9

  • affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True

  • track_running_stats (bool) – when set to True, this module tracks the running mean and variance. When set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

  • freeze (bool) – when set to True, this module does not update the running mean and variance, and uses the running mean and variance instead of the batch mean and batch variance to normalize the input. The parameter takes effect only when the module is initilized with track_running_stats as True and the module is in training mode. Default: False

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

# With Learnable Parameters
m = M.BatchNorm2d(4)
inp = mge.tensor(np.random.rand(1, 4, 3, 3).astype("float32"))
oup = m(inp)
print(m.weight.numpy().flatten(), m.bias.numpy().flatten())
# Without L`e`arnable Parameters
m = M.BatchNorm2d(4, affine=False)
oup = m(inp)
print(m.weight, m.bias)

Outputs:

[1. 1. 1. 1.] [0. 0. 0. 0.]
None None
class megengine.module.batchnorm.SyncBatchNorm(num_features, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True, freeze=False, group=<megengine.distributed.group.Group object>)[source]

Bases: megengine.module.batchnorm._BatchNorm

Applies Synchronization Batch Normalization.

forward(inp)[source]

megengine.module.concat

class megengine.module.concat.Concat[source]

Bases: megengine.module.module.Module

A Module to do functional concat. Could be replaced with QATModule version Concat using quantize_qat().

forward(inps, axis=0)[source]

megengine.module.conv

class megengine.module.conv.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT')[source]

Bases: megengine.module.conv._ConvNd

Applies a 1D convolution over an input tensor.

For instance, given an input of the size \((N, C_{\text{in}}, H)\), this layer generates an output of the size \((N, C_{\text{out}}, H_{\text{out}}})\) through the process described as below:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]

where \(\star\) is the valid 1D cross-correlation operator, \(N\) is batch size, \(C\) denotes number of channels, and \(H\) is length of 1D data element.

When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as depthwise convolution.

In other words, for an input of size \((N, C_{in}, H_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\).

Parameters
  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (int) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1

  • stride (int) – stride of the 1D convolution operation. Default: 1

  • padding (int) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0

  • dilation (int) – dilation of the 1D convolution operation. Default: 1

  • groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups, and there would be an extra dimension at the beginning of the weight’s shape. Specifically, the shape of weight would be (groups, out_channel // groups, in_channels // groups, *kernel_size).

  • bias (bool) – whether to add a bias onto the result of convolution. Default: True

  • conv_mode (str) – Supports CROSS_CORRELATION. Default: CROSS_CORRELATION

  • compute_mode (str) – When set to “DEFAULT”, no special requirements will be placed on the precision of intermediate results. When set to “FLOAT32”, “Float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.Conv1d(in_channels=3, out_channels=1, kernel_size=3)
inp = mge.tensor(np.arange(0, 24).astype("float32").reshape(2, 3, 4))
oup = m(inp)
print(oup.numpy().shape)

Outputs:

(2, 1, 2)
calc_conv(inp, weight, bias)[source]
forward(inp)[source]
class megengine.module.conv.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT')[source]

Bases: megengine.module.conv._ConvNd

Applies a 2D convolution over an input tensor.

For instance, given an input of the size \((N, C_{\text{in}}, H, W)\), this layer generates an output of the size \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) through the process described as below:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]

where \(\star\) is the valid 2D cross-correlation operator, \(N\) is batch size, \(C\) denotes number of channels, \(H\) is height of input planes in pixels, and \(W\) is width in pixels.

In general, output feature maps’ shapes can be inferred as follows:

input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\) output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) where

\[\text{H}_{out} = \lfloor \frac{\text{H}_{in} + 2 * \text{padding[0]} - \text{dilation[0]} * (\text{kernel_size[0]} - 1)}{\text{stride[0]}} + 1 \rfloor\]
\[\text{W}_{out} = \lfloor \frac{\text{W}_{in} + 2 * \text{padding[1]} - \text{dilation[1]} * (\text{kernel_size[1]} - 1)}{\text{stride[1]}} + 1 \rfloor\]

When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as depthwise convolution.

In other words, for an input of size \((N, C_{in}, H_{in}, W_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\).

Parameters
  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Union[int, Tuple[int, int]]) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1

  • stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1

  • padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0

  • dilation (Union[int, Tuple[int, int]]) – dilation of the 2D convolution operation. Default: 1

  • groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups, and there would be an extra dimension at the beginning of the weight’s shape. Specifically, the shape of weight would be (groups, out_channel // groups, in_channels // groups, *kernel_size).

  • bias (bool) – whether to add a bias onto the result of convolution. Default: True

  • conv_mode (str) – Supports CROSS_CORRELATION. Default: CROSS_CORRELATION

  • compute_mode (str) – When set to “DEFAULT”, no special requirements will be placed on the precision of intermediate results. When set to “FLOAT32”, “Float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.Conv2d(in_channels=3, out_channels=1, kernel_size=3)
inp = mge.tensor(np.arange(0, 96).astype("float32").reshape(2, 3, 4, 4))
oup = m(inp)
print(oup.numpy().shape)

Outputs:

(2, 1, 2, 2)
calc_conv(inp, weight, bias)[source]
forward(inp)[source]
class megengine.module.conv.ConvRelu2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT')[source]

Bases: megengine.module.conv.Conv2d

A fused Module including Conv2d and relu. Could be replaced with QATModule version ConvRelu2d using quantize_qat().

forward(inp)[source]
class megengine.module.conv.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT')[source]

Bases: megengine.module.conv._ConvNd

Applies a 2D transposed convolution over an input tensor.

This module is also known as a deconvolution or a fractionally-strided convolution. ConvTranspose2d can be seen as the gradient of Conv2d operation with respect to its input.

Convolution usually reduces the size of input, while transposed convolution works the opposite way, transforming a smaller input to a larger output while preserving the connectivity pattern.

Parameters
  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Union[int, Tuple[int, int]]) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1

  • stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1

  • padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0

  • dilation (Union[int, Tuple[int, int]]) – dilation of the 2D convolution operation. Default: 1

  • groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups, and there would be an extra dimension at the beginning of the weight’s shape. Specifically, the shape of weight would be (groups, out_channels // groups, in_channels // groups, *kernel_size). Default: 1

  • bias (bool) – wether to add a bias onto the result of convolution. Default: True

  • conv_mode (str) – Supports CROSS_CORRELATION. Default: CROSS_CORRELATION

  • compute_mode (str) – When set to “DEFAULT”, no special requirements will be placed on the precision of intermediate results. When set to “FLOAT32”, “Float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype.

forward(inp)[source]
class megengine.module.conv.LocalConv2d(in_channels, out_channels, input_height, input_width, kernel_size, stride=1, padding=0, dilation=1, groups=1, conv_mode='CROSS_CORRELATION')[source]

Bases: megengine.module.conv.Conv2d

Applies a spatial convolution with untied kernels over an groupped channeled input 4D tensor. It is also known as the locally connected layer.

Parameters
  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • input_height (int) – the height of the input images.

  • input_width (int) – the width of the input images.

  • kernel_size (Union[int, Tuple[int, int]]) – size of weight on spatial dimensions. If kernel_size is an int, the actual kernel size would be (kernel_size, kernel_size). Default: 1

  • stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1

  • padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Only zero-padding is supported. Default: 0

  • groups (int) – number of groups into which the input and output channels are divided, so as to perform a “grouped convolution”. When groups is not 1, in_channels and out_channels must be divisible by groups. The shape of weight is (groups, output_height, output_width, in_channels // groups, *kernel_size, out_channels // groups).

forward(inp)[source]

megengine.module.conv_bn

class megengine.module.conv_bn.ConvBn2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT', eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)[source]

Bases: megengine.module.conv_bn._ConvBnActivation2d

A fused Module including Conv2d, BatchNorm2d. Could be replaced with QATModule version ConvBn2d using quantize_qat().

forward(inp)[source]
class megengine.module.conv_bn.ConvBnRelu2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='CROSS_CORRELATION', compute_mode='DEFAULT', eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)[source]

Bases: megengine.module.conv_bn._ConvBnActivation2d

A fused Module including Conv2d, BatchNorm2d and relu. Could be replaced with QATModule version ConvBnRelu2d using quantize_qat().

forward(inp)[source]

megengine.module.dropout

class megengine.module.dropout.Dropout(drop_prob=0.0)[source]

Bases: megengine.module.module.Module

Randomly sets input elements to zeros with the probability \(drop\_prob\) during training. Commonly used in large networks to prevent overfitting. Note that we perform dropout only during training, we also rescale(multiply) the output tensor by \(\frac{1}{1 - drop\_prob}\). During inference Dropout is equal to Identity.

Parameters

drop_prob – The probability to drop (set to zero) each single element

forward(inputs)[source]

megengine.module.elemwise

class megengine.module.elemwise.Elemwise(method)[source]

Bases: megengine.module.module.Module

A Module to do elemwise operator. Could be replaced with QATModule version Elemwise using quantize_qat().

Parameters

method

the elemwise method, support the following string. It will do the normal elemwise operator for float.

  • ”ADD”: a + b

  • ”FUSE_ADD_RELU”: max(x+y, 0)

  • ”MUL”: x * y

  • ”MIN”: min(x, y)

  • ”MAX”: max(x, y)

  • ”SUB”: x - y

  • ”TRUE_DIV”: x / y

  • ”FUSE_ADD_SIGMOID”: sigmoid(x + y)

  • ”FUSE_ADD_TANH”: tanh(x + y)

  • ”RELU”: x > 0 ? x : 0

  • ”ABS”: x > 0 ? x : -x

  • ”SIGMOID”: sigmoid(x)

  • ”EXP”: exp(x)

  • ”TANH”: tanh(x)

  • ”FUSE_MUL_ADD3”: x * y + z

  • ”FAST_TANH”: x * (27. + x * x) / (27. + 9. * x * x)

  • ”NEGATE”: -x

  • ”ACOS”: acos(x)

  • ”ASIN”: asin(x)

  • ”CEIL”: ceil(x)

  • ”COS”: cos(x)

  • ”EXPM1”: expm1(x)

  • ”FLOOR”: floor(x)

  • ”LOG”: log(x)

  • ”LOG1P”: log1p(x)

  • ”SIN”: sin(x)

  • ”ROUND”: round(x)

  • ”ERF”: erf(x)

  • ”ERFINV”: erfinv(x)

  • ”ERFC”: erfc(x)

  • ”ERFCINV”: erfcinv(x)

  • ”ABS_GRAD”: abs_grad

  • ”FLOOR_DIV”: floor_div

  • ”MOD”: mod

  • ”SIGMOID_GRAD”: sigmoid_grad

  • ”SWITCH_GT0”: switch_gt0

  • ”TANH_GRAD”: tanh_grad

  • ”LT”: less

  • ”LEQ”: leq

  • ”EQ”: equal

  • ”POW”: pow

  • ”LOG_SUM_EXP”: log_sum_exp

  • ”FAST_TANH_GRAD”: fast_tanh_grad

  • ”ATAN2”: atan2

  • ”COND_LEQ_MOV”: cond_leq_mov

  • ”H_SWISH”: h_swish

  • ”FUSE_ADD_H_SWISH”: h_swish(x+y)

  • ”H_SWISH_GRAD”: h_swish_grad

  • ”AND”: bool binary: x && y

  • ”OR”: bool binary: x || y

  • ”XOR”: bool binary: x ^ y

  • ”NOT”: bool unary: ~x

forward(*inps)[source]

megengine.module.embedding

class megengine.module.embedding.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=None, initial_weight=None, freeze=False)[source]

Bases: megengine.module.module.Module

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. The indices should less than num_embeddings.

Parameters
  • num_embeddings (int) – size of embedding dictionary.

  • embedding_dim (int) – size of each embedding vector.

  • padding_idx (Optional[int]) – should be set to None, not supportted now.

  • max_norm (Optional[float]) – should be set to None, not supportted now.

  • norm_type (Optional[float]) – should be set to None, not supportted now.

  • initial_weight (Optional[Parameter]) – the learnable weights of the module of shape (num_embeddings, embedding_dim).

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
weight = mge.tensor(np.array([(1.2,2.3,3.4,4.5,5.6)], dtype=np.float32))
data = mge.tensor(np.array([(0,0)], dtype=np.int32))

embedding = M.Embedding(1, 5, initial_weight=weight)
output = embedding(data)
with np.printoptions(precision=6):
    print(output.numpy())

Outputs:

[[[1.2 2.3 3.4 4.5 5.6]
  [1.2 2.3 3.4 4.5 5.6]]]
forward(inputs)[source]
classmethod from_pretrained(embeddings, freeze=True, padding_idx=None, max_norm=None, norm_type=None)[source]

Creates Embedding instance from given 2-dimensional FloatTensor.

Parameters
  • embeddings (Parameter) – tensor contained weight for the embedding.

  • freeze (Optional[bool]) – if True, the weight does not get updated during the learning process. Default: True.

  • padding_idx (Optional[int]) – should be set to None, not support Now.

  • max_norm (Optional[float]) – should be set to None, not support Now.

  • norm_type (Optional[float]) – should be set to None, not support Now.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
weight = mge.tensor(np.array([(1.2,2.3,3.4,4.5,5.6)], dtype=np.float32))
data = mge.tensor(np.array([(0,0)], dtype=np.int32))

embedding = M.Embedding.from_pretrained(weight, freeze=False)
output = embedding(data)
print(output.numpy())

Outputs:

[[[1.2 2.3 3.4 4.5 5.6]
  [1.2 2.3 3.4 4.5 5.6]]]
reset_parameters()[source]
Return type

None

megengine.module.identity

class megengine.module.identity.Identity[source]

Bases: megengine.module.module.Module

A placeholder identity operator that will ignore any argument.

forward(x)[source]

megengine.module.init

megengine.module.init.calculate_correct_fan(tensor, mode)[source]

Calculates fan_in / fan_out value for given weight tensor, depending on given mode.

See calculate_fan_in_and_fan_out() for details.

Parameters
  • tensor (Tensor) – weight tensor in NCHW format.

  • mode (str) – “fan_in” or “fan_out”.

Return type

float

megengine.module.init.calculate_fan_in_and_fan_out(tensor)[source]

Calculates fan_in / fan_out value for given weight tensor. This function assumes input tensor is stored in NCHW format.

Parameters

tensor (Tensor) – weight tensor in NCHW format.

Return type

Tuple[float, float]

megengine.module.init.calculate_gain(nonlinearity, param=None)[source]

Returns a recommended gain value (see the table below) for the given nonlinearity function.

nonlinearity

gain

Linear / Identity

\(1\)

Conv{1,2,3}D

\(1\)

Sigmoid

\(1\)

Tanh

\(\frac{5}{3}\)

ReLU

\(\sqrt{2}\)

Leaky Relu

\(\sqrt{\frac{2}{1 + {\text{negative}_\text{slope}}^2}}\)

Parameters
  • nonlinearity (str) – name of the non-linear function.

  • param (Union[int, float, None]) – optional parameter for leaky_relu. Only effective when nonlinearity is “leaky_relu”.

Return type

float

megengine.module.init.fill_(tensor, val)[source]

Fills the given tensor with value val.

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • val (Union[float, int]) – value to be filled throughout the tensor.

Return type

None

megengine.module.init.msra_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Fills tensor wilth random values sampled from \(\mathcal{N}(0, \text{std}^2)\) where

\[\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan_in}}}\]

Detailed information can be retrieved from Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification

Parameters
  • tensor (Tensor) – tensor to be initialized

  • a (float) – optional parameter for calculating gain for leaky_relu. See calculate_gain() for details.

  • mode (str) – “fan_in” or “fan_out”, used to calculate \(gain\), the scaling factor for \(gain\). See calculate_fan_in_and_fan_out() for details.

  • nonlinearity (str) – name of the non-linear function used to calculate \(gain\). See calculate_gain() for details.

Return type

None

megengine.module.init.msra_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Fills tensor wilth random values sampled from \(\mathcal{U}(-\text{bound}, \text{bound})\) where

\[\text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan_in}}}\]

Detailed information can be retrieved from Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • a (float) – optional parameter for calculating gain for leaky_relu. See calculate_gain() for details.

  • mode (str) – “fan_in” or “fan_out”, used to calculate \(gain\), the scaling factor for \(bound\). See calculate_fan_in_and_fan_out() for details.

  • nonlinearity (str) – name of the non-linear function used to calculate \(gain\). See calculate_gain() for details.

Return type

None

megengine.module.init.normal_(tensor, mean=0.0, std=1.0)[source]

Fills the given tensor with random value sampled from normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • mean (float) – mean of the normal distribution.

  • std (float) – standard deviation of the normal distribution.

Return type

None

megengine.module.init.ones_(tensor)[source]

Fills the given tensor with the scalar value 1.

Parameters

tensor (Tensor) – tensor to be initialized.

Return type

None

megengine.module.init.uniform_(tensor, a=0.0, b=1.0)[source]

Fills the given tensor with random value sampled from uniform distribution \(\mathcal{U}(\text{a}, \text{b})\).

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • a (float) – lower bound of the sampling interval.

  • b (float) – upper bound of the sampling interval.

Return type

None

megengine.module.init.xavier_normal_(tensor, gain=1.0)[source]

Fills tensor with random values sampled from \(\mathcal{N}(0, \text{std}^2)\) where

\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]

Also known as Glorot initialization. Detailed information can be retrieved from Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • gain (float) – scaling factor for \(std\).

Return type

None

megengine.module.init.xavier_uniform_(tensor, gain=1.0)[source]

Fills tensor with random values sampled from \(\mathcal{U}(-a, a)\) where

\[a = \text{gain} \times \sqrt{\frac{6}{\text{fan_in} + \text{fan_out}}}\]

Also known as Glorot initialization. Detailed information can be retrieved from Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).

Parameters
  • tensor (Tensor) – tensor to be initialized.

  • gain (float) – scaling factor for \(a\).

Return type

None

megengine.module.init.zeros_(tensor)[source]

Fills the given tensor with scalar value 0.

Parameters

tensor (Tensor) – tensor to be initialized.

Return type

None

megengine.module.linear

class megengine.module.linear.Linear(in_features, out_features, bias=True, **kwargs)[source]

Bases: megengine.module.module.Module

Applies a linear transformation to the input. For instance, if input is x, then output y is:

\[y = xW^T + b\]

where \(y_i= \sum_j W_{ij} x_j + b_i\)

Parameters
  • in_features (int) – size of each input sample.

  • out_features (int) – size of each output sample.

  • bias (bool) – if it’s False, the layer will not learn an additional bias. Default: True

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.Linear(in_features=3, out_features=1)
inp = mge.tensor(np.arange(0, 6).astype("float32").reshape(2, 3))
oup = m(inp)
print(oup.numpy().shape)

Outputs:

(2, 1)
forward(x)[source]
reset_parameters()[source]
Return type

None

megengine.module.module

class megengine.module.module.Module[source]

Bases: object

Base Module class.

apply(fn)[source]

Applies function fn to all the modules within this module, including itself.

Parameters

fn (Callable[[Module], Any]) – the function to be applied on modules.

Return type

None

buffers(recursive=True, **kwargs)[source]

Returns an iterable for the buffers of the module.

Buffer is defined to be Tensor excluding Parameter.

Parameters

recursive (bool) – if True, returns all buffers within this module, else only returns buffers that are direct attributes of this module.

Return type

Iterable[Tensor]

children(**kwargs)[source]

Returns an iterable for all the submodules that are direct attributes of this module.

Return type

Iterable[Module]

disable_quantize(value=True)[source]

Sets module’s quantize_disabled attribute and return module. Could be used as a decorator.

eval()[source]

Sets training mode of all the modules within this module (including itself) to False. See train() for details.

Return type

None

abstract forward(inputs)[source]
load_state_dict(state_dict, strict=True)[source]

Loads a given dictionary created by state_dict() into this module. If strict is True, the keys of state_dict() must exactly match the keys returned by state_dict().

Users can also pass a closure: Function[key: str, var: Tensor] -> Optional[np.ndarray] as a state_dict, in order to handle complex situations. For example, load everything except for the final linear classifier:

state_dict = {...}  #  Dict[str, np.ndarray]
model.load_state_dict({
    k: None if k.startswith('fc') else v
    for k, v in state_dict.items()
}, strict=False)

Here returning None means skipping parameter k.

To prevent shape mismatch (e.g. load PyTorch weights), we can reshape before loading:

state_dict = {...}
def reshape_accordingly(k, v):
    return state_dict[k].reshape(v.shape)
model.load_state_dict(reshape_accordingly)

We can also perform inplace re-initialization or pruning:

def reinit_and_pruning(k, v):
    if 'bias' in k:
        M.init.zero_(v)
    if 'conv' in k:
        return v.numpy() * (np.abs(v.numpy()) > 1e-3).astype("float32)
model.load_state_dict(reinit_and_pruning, strict=False)
modules(**kwargs)[source]

Returns an iterable for all the modules within this module, including itself.

Return type

Iterable[Module]

named_buffers(prefix=None, recursive=True, **kwargs)[source]

Returns an iterable for key buffer pairs of the module, where key is the dotted path from this module to the buffer.

Buffer is defined to be Tensor excluding Parameter.

Parameters
  • prefix (Optional[str]) – prefix prepended to the keys.

  • recursive (bool) – if True, returns all buffers within this module, else only returns buffers that are direct attributes of this module.

Return type

Iterable[Tuple[str, Tensor]]

named_children(**kwargs)[source]

Returns an iterable of key-submodule pairs for all the submodules that are direct attributes of this module, where ‘key’ is the attribute name of submodules.

Return type

Iterable[Tuple[str, Module]]

named_modules(prefix=None, **kwargs)[source]

Returns an iterable of key-module pairs for all the modules within this module, including itself, where ‘key’ is the dotted path from this module to the submodules.

Parameters

prefix (Optional[str]) – prefix prepended to the path.

Return type

Iterable[Tuple[str, Module]]

named_parameters(prefix=None, recursive=True, **kwargs)[source]

Returns an iterable for key Parameter pairs of the module, where key is the dotted path from this module to the Parameter.

Parameters
  • prefix (Optional[str]) – prefix prepended to the keys.

  • recursive (bool) – if True, returns all Parameter within this module, else only returns Parameter that are direct attributes of this module.

Return type

Iterable[Tuple[str, Parameter]]

parameters(recursive=True, **kwargs)[source]

Returns an iterable for the Parameter of the module.

Parameters

recursive (bool) – If True, returns all Parameter within this module, else only returns Parameter that are direct attributes of this module.

Return type

Iterable[Parameter]

register_forward_hook(hook)[source]

Registers a hook to handle forward results. hook should be a function that receive module, inputs and outputs, then return a modified outputs or None.

This method return a handler with remove() interface to delete the hook.

Return type

HookHandler

register_forward_pre_hook(hook)[source]

Registers a hook to handle forward inputs. hook should be a function.

Parameters

hook (Callable) – a function that receive module and inputs, then return

a modified inputs or None. :rtype: HookHandler :return: a handler with remove() interface to delete the hook.

replace_param(params, start_pos, seen=None)[source]

Replaces module’s parameters with params, used by ParamPack to speedup multimachine training.

Deprecated since version 1.0.

state_dict(rst=None, prefix='', keep_var=False)[source]
train(mode=True, recursive=True)[source]

Sets training mode of all the modules within this module (including itself) to mode. This effectively sets the training attributes of those modules to mode, but only has effect on certain modules (e.g. BatchNorm2d, Dropout, Observer)

Parameters
  • mode (bool) – the training mode to be set on modules.

  • recursive (bool) – whether to recursively call submodules’ train().

Return type

None

zero_grad()[source]

Sets all parameters’ grads to zero

Deprecated since version 1.0.

Return type

None

megengine.module.normalization

class megengine.module.normalization.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)[source]

Bases: megengine.module.module.Module

Simple implementation of GroupNorm. Only support 4d tensor now. Reference: https://arxiv.org/pdf/1803.08494.pdf.

forward(x)[source]
reset_parameters()[source]
class megengine.module.normalization.InstanceNorm(num_channels, eps=1e-05, affine=True)[source]

Bases: megengine.module.module.Module

Simple implementation of InstanceNorm. Only support 4d tensor now. Reference: https://arxiv.org/abs/1607.08022. Note that InstanceNorm equals using GroupNome with num_groups=num_channels.

forward(x)[source]
reset_parameters()[source]
class megengine.module.normalization.LayerNorm(num_channels, eps=1e-05, affine=True)[source]

Bases: megengine.module.module.Module

Simple implementation of LayerNorm. Only support 4d tensor now. Reference: https://arxiv.org/pdf/1803.08494.pdf. Note that LayerNorm equals using GroupNorm with num_groups=1.

forward(x)[source]
reset_parameters()[source]

megengine.module.pooling

class megengine.module.pooling.AvgPool2d(kernel_size, stride=None, padding=0)[source]

Bases: megengine.module.pooling._PoolNd

Applies a 2D average pooling over an input.

For instance, given an input of the size \((N, C, H, W)\) and kernel_size \((kH, kW)\), this layer generates the output of the size \((N, C, H_{out}, W_{out})\) through a process described as:

\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

Parameters
  • kernel_size (Union[int, Tuple[int, int]]) – the size of the window.

  • stride (Union[int, Tuple[int, int], None]) – the stride of the window. Default value is kernel_size。

  • padding (Union[int, Tuple[int, int]]) – implicit zero padding to be added on both sides.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.AvgPool2d(kernel_size=3, stride=1, padding=0)
inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4))
oup = m(inp)
print(oup.numpy())

Outputs:

[[[[ 5.  6.]
   [ 9. 10.]]]]
forward(inp)[source]
class megengine.module.pooling.MaxPool2d(kernel_size, stride=None, padding=0)[source]

Bases: megengine.module.pooling._PoolNd

Applies a 2D max pooling over an input.

For instance, given an input of the size \((N, C, H, W)\) and kernel_size \((kH, kW)\), this layer generates the output of the size \((N, C, H_{out}, W_{out})\) through a process described as:

\[\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned}\]

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

Parameters
  • kernel_size (Union[int, Tuple[int, int]]) – the size of the window to take a max over.

  • stride (Union[int, Tuple[int, int], None]) – the stride of the window. Default value is kernel_size.

  • padding (Union[int, Tuple[int, int]]) – implicit zero padding to be added on both sides.

Examples:

import numpy as np
import megengine as mge
import megengine.module as M

m = M.MaxPool2d(kernel_size=3, stride=1, padding=0)
inp = mge.tensor(np.arange(0, 16).astype("float32").reshape(1, 1, 4, 4))
oup = m(inp)
print(oup.numpy())

Outputs:

[[[[10. 11.]
   [14. 15.]]]]
forward(inp)[source]

megengine.module.quant_dequant

class megengine.module.quant_dequant.DequantStub[source]

Bases: megengine.module.module.Module

A helper Module simply returning input. Could be replaced with QATModule version DequantStub using quantize_qat().

forward(inp)[source]
class megengine.module.quant_dequant.QuantStub[source]

Bases: megengine.module.module.Module

A helper Module simply returning input. Could be replaced with QATModule version QuantStub using quantize_qat().

forward(inp)[source]

megengine.module.sequential

class megengine.module.sequential.Sequential(*args)[source]

Bases: megengine.module.module.Module

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

To make it easier to understand, here is a small example:

Examples:

import numpy as np
import megengine as mge
import megengine.module as M
import megengine.functional as F
from collections import OrderedDict

batch_size = 64
data = mge.tensor(np.zeros((batch_size, 28 * 28)), dtype=np.float32)
label = mge.tensor(np.zeros(batch_size,), dtype=np.int32)

net0 = M.Sequential(
        M.Linear(28 * 28, 320),
        M.Linear(320, 10)
    )
pred0 = net0(data)

modules = OrderedDict()
modules["fc0"] = M.Linear(28 * 28, 320)
modules["fc1"] = M.Linear(320, 10)
net1 = M.Sequential(modules)
pred1 = net1(data)
forward(inp)[source]
property layer_values