megengine.utils.network.Network.optimize_for_inference¶
- Network.optimize_for_inference(dest_vars, **kwargs)[源代码]¶
优化该网络,使其在推理时获得更优越的性能。
- 参数
dest_vars – list of output vars in the operator graph
Keyword Arguments:
enable_io16xc32 – whether to use float16 for I/O between oprs and use float32 as internal computation precision. Note the output var would be changed to float16.
enable_ioc16 – whether to use float16 for both I/O and computation precision.
enable_hwcd4 – whether to use NHWCD4 data layout. This is faster on some OpenCL backend.
enable_nchw88 – whether to use NCHW88 data layout, currently used in X86 AVX backend.
enable_nchw44 – whether to use NCHW44 data layout, currently used in arm backend.
enable_nchw44_dot – whether to use NCHW44_dot data layout, currently used in armv8.2+dotprod backend.
enable_nchw4 – whether to use NCHW4 data layout, currently used in nvidia backend(based on cudnn).
enable_nchw32 – whether to use NCHW32 data layout, currently used in nvidia backend with tensorcore(based on cudnn).
enable_chwn4 – whether to use CHWN4 data layout, currently used in nvidia backend with tensorcore.
enable_nchw64 – whether to use NCHW64 data layout, used for fast int4 support on Nvidia GPU.
enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty into one opr.
enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z input for inference on nvidia backend(this optimization pass will result in mismatch of the precision of output of training and inference)