Conv2d¶
- class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, conv_mode='cross_correlation', compute_mode='default', padding_mode='zeros', **kwargs)[source]¶
Applies a 2D convolution over an input tensor.
For instance, given an input of the size \((N, C_{\text{in}}, H, W)\), this layer generates an output of the size \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) through the process described as below:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid 2D cross-correlation operator, \(N\) is batch size, \(C\) denotes number of channels, \(H\) is height of input planes in pixels, and \(W\) is width in pixels.
In general, output feature maps’ shapes can be inferred as follows:
input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\)
output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) where
\[\text{H}_{out} = \lfloor \frac{\text{H}_{in} + 2 * \text{padding[0]} - \text{dilation[0]} * (\text{kernel_size[0]} - 1) - 1}{\text{stride[0]}} + 1 \rfloor\]\[\text{W}_{out} = \lfloor \frac{\text{W}_{in} + 2 * \text{padding[1]} - \text{dilation[1]} * (\text{kernel_size[1]} - 1) - 1}{\text{stride[1]}} + 1 \rfloor\]When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also known as depthwise convolution.
In other words, for an input of size \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{\text{in}}, out\_channels=C_{\text{in}} \times K, ..., groups=C_{\text{in}})\).
- Parameters
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
kernel_size (Union[int, Tuple[int, int]]) – size of weight on spatial dimensions. If kernel_size is an
int, the actual kernel size would be(kernel_size, kernel_size).stride (Union[int, Tuple[int, int]]) – stride of the 2D convolution operation. Default: 1.
padding (Union[int, Tuple[int, int]]) – size of the paddings added to the input on both sides of its spatial dimensions. Default: 0.
dilation (Union[int, Tuple[int, int]]) – dilation of the 2D convolution operation. Default: 1.
groups (int) – number of groups into which the input and output channels are divided, so as to perform a
grouped convolution. Whengroupsis not 1,in_channelsandout_channelsmust be divisible bygroups, and the shape of weight should be(groups, out_channel // groups, in_channels // groups, height, width). Default: 1.bias (bool) – whether to add a bias onto the result of convolution. Default: True.
conv_mode (str) – supports cross_correlation. Default: cross_correlation.
compute_mode (str) – when set to “default”, no special requirements will be placed on the precision of intermediate results. When set to “float32”, “float32” would be used for accumulator and intermediate result, but only effective when input and output are of float16 dtype. Default: default.
padding_mode (str) – “zeros”, “reflect” or “replicate”. Default: “zeros”. Refer to
Padfor more information.
- Shape:
input: \((N, C_{\text{in}}, H_{\text{in}}, W_{\text{in}})\).output: \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\).
Note
weightusually has shape(out_channels, in_channels, height, width),if groups is not 1, shape will be
(groups, out_channels // groups, in_channels // groups, height, width)
biasusually has shape(1, out_channels, *1)
- Returns
module. The instance of the
Conv2dmodule.- Return type
Return type
Examples
>>> import numpy as np >>> m = M.Conv2d(in_channels=3, out_channels=1, kernel_size=3) >>> inp = mge.tensor(np.arange(0, 96).astype("float32").reshape(2, 3, 4, 4)) >>> oup = m(inp) >>> oup.numpy().shape (2, 1, 2, 2)