Model Deployment Overview and Process Recommendations#
After using MegEngine to complete the model training process, in order for the model to realize its value, we need to “deploy” the model, that is, use the model for inference under the constraints of specific hardware devices and system environments.
Depending on the final deployment device, we may go through different deployment:
computing hardware |
Example |
Applicable scene |
---|---|---|
Devices with a Python environment |
GPU server |
I want it to be as simple as possible, and I don’t care about Python performance limitations |
Devices for the C/C++ environment |
Any device, especially embedded chips, TEE environments, etc. |
I hope the performance is as high as possible, the resource usage is low, and I can accept the complexity of compiling C++ libraries |
NPU |
Atlas / RockChip / Cambrian and other chips |
Need to use the computing power of NPU, accept slightly complicated conversion steps |
In the following flow chart, you can understand several basic steps in different deployment:.
Note
In order to better choose model deployment, you need to know the following points:
The most recommended route is training code ->
.tm
file ->.mge
file -> Lite execution;If there is a researcher/engineering division of labor in your team, it is recommended to use the
.tm
file as the interface - the researcher is responsible for delivering the.tm
model (permanently archived), and the engineering staff is responsible for the subsequent deployment process ;If you are solely responsible for the complete training-to-deployment process and don’t care about archiving the model long-term. For convenience, the
.mge
file (ie the dashed line above) can be generated directly from the training code, and the results are equivalent.
See also
mgeconvert : Various converters for MegEngine.