megengine.utils.network.Network.optimize_for_inference

Network.optimize_for_inference(dest_vars, **kwargs)[源代码]

优化该网络,使其在推理时获得更优越的性能。

参数

dest_vars – list of output vars in the operator graph

Keyword Arguments:

  • enable_io16xc32 – whether to use float16 for I/O between oprs and use float32 as internal computation precision. Note the output var would be changed to float16.

  • enable_ioc16 – whether to use float16 for both I/O and computation precision.

  • enable_hwcd4 – whether to use NHWCD4 data layout. This is faster on some OpenCL backend.

  • enable_nchw88 – whether to use NCHW88 data layout, currently used in X86 AVX backend.

  • enable_nchw44 – whether to use NCHW44 data layout, currently used in arm backend.

  • enable_nchw44_dot – whether to use NCHW44_dot data layout, currently used in armv8.2+dotprod backend.

  • enable_nchw4 – whether to use NCHW4 data layout, currently used in nvidia backend(based on cudnn).

  • enable_nchw32 – whether to use NCHW32 data layout, currently used in nvidia backend with tensorcore(based on cudnn).

  • enable_chwn4 – whether to use CHWN4 data layout, currently used in nvidia backend with tensorcore.

  • enable_nchw64 – whether to use NCHW64 data layout, used for fast int4 support on Nvidia GPU.

  • enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty into one opr.

  • enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z input for inference on nvidia backend(this optimization pass will result in mismatch of the precision of output of training and inference)