Export serialized model file (Dump)¶
Note
Considering the reasoning deployment requirements, use: py: meth: ~ .jit.trace.dump , to the trained model serialization to a file or object:
我们以使用 ResNet50 的预训练模型为例子,参考代码片段如下:
import numpy as np
import megengine.functional as F
import megengine.hub
from megengine import jit, tensor
if __name__ == "__main__":
net = megengine.hub.load("megengine/models", "resnet50", pretrained=True)
net.eval()
@jit.trace(symbolic=True, capture_as_const=True)
def fun(data, *, net):
pred = net(data)
pred_normalized = F.softmax(pred)
return pred_normalized
data = tensor(np.random.random([1, 3, 224, 224]).astype(np.float32))
fun(data, net=net)
fun.dump("resnet50.mge", arg_names=["data"])
After executing the script and completing the model conversion, we have obtained the pre-trained model file resnet50.mge
which can be recognized by the MegEngine C++ API.
Dump common parameter description¶
When using dump
, multiple parameters can be passed in, among which the most commonly used are the following two:
arg_names
Set the name of the model input Tensor uniformly during serialization. Due to the differences of different models, the names of the input Tensor will be very different. In order to reduce the difficulty of understanding and use, this parameter can be used to uniformly set the model input to such as
arg_0
,arg_1
, …- ``optimize_for_inference’’
The trained model often cannot play the best performance during deployment, and we provide ``optimize_for_inference’’ to ensure that the serialized model is specifically optimized. For detailed key-value parameters, see Inference optimization options table below.
Warning
The optimize_for_inference
parameter defaults to True
, so even if you don’t optimize the parameters for any key value, you will still do some basic optimization operations, which will cause the serialized model to be slightly different from the previous definition .
Dump model file with test data¶
When using dump
, set the following parameters:
input_data
This is a list of strings, each string in the list represents a set of test data.
String supports three formats:
var0:file0;var1:file1...
specifies the filename for each input variable. The file can be an image that can be loaded by opencv, or a pickle file of numpy.ndarray. If there is only one input, the variable name of the input can be omitted.var0:#rand(min, max, shape);var1:#rand...
specifies how the data of each input variable is randomly generated, the shape isshape
, and the value range is[min, max)
. For examplerand(0, 255)
,rand(0, 255, 1, 3, 224, 224)
or#rand(0, 255, 1, ...)
(where...
means the remainder of the shape). If shape is not specified, the shape of the input tensor in the network is used. If there is only one input, the variable name of the input can be omitted.@filename
specifies the input filename, each line in the file is a string that conforms to the above two formats.
For more related parameter settings, please refer to dump
.
Inference optimization options table¶
- ``–enable-io16xc32’’
Use float16 as the data transfer type between operators and float32 as the calculation type.
- ``–enable-ioc16’’
Use float16 as the data transmission type and calculation type between operators.
--enable-fuse-conv-bias-nonlinearity
Whether to integrate conv+bias+nonlinearity.
- ``–enalbe-hwcd4’’
Use hwcd4 data layout.
- ``–enable-nchw88’’
Use nchw88 data layout.
- ``–enable-nchw44’’
Use nchw44 data layout.
- ``–enable-nchw44-dot’’
Use nchw44_dot data layout.
- ``–enable-nchw32’’
Use nchw32 data layout.
- ``–enable-chwn4’’
Use chwn4 data layout.
--enable-fuse-conv-bias-with-z
Only available under the GPU platform, conv, bias (elemwise add), z (elemwise add) are merged into one operator.