megengine.jit.trace.dump

trace.dump(file, *, arg_names=None, output_names=None, append=False, keep_var_name=1, keep_opr_name=False, keep_param_name=False, keep_opr_priority=False, strip_info_file=None, append_json=False, optimize_for_inference=True, user_info=None, enable_metadata=True, **kwargs)[源代码]

序列化被追溯 (trace) 的模型并保存到文件。

参数
  • file – 输出文件,可以是文件对象或文件名

  • arg_names – 被追溯(traced)函数的输入张量的名字。

  • output_names – 被追溯(traced)函数的输出张量的名字,如果未指明则使用默认名字。

  • append – 是否在 file 后追加输出。仅当 file 是文件名时可用。

  • keep_var_name (int) – 保留变量名的级别:

  • keep_opr_name (bool) – 是否要保留算子的名字

  • keep_param_name (bool) – 是否要保留参数的名字,为了加载模型后可以简单地对参数做操作

  • keep_opr_priority (bool) – 是否保留算子的优先级设置

  • strip_info_file – 路径地址或文件句柄。如果不为空,则导出的代码条信息会被写入``strip_info_file``中。

  • append_json – 当 strip_info_file 非空时会做检查。如果是真,代码条信息就会被添加到 strip_info_file 的尾部;如果是假,就会覆盖掉 strip_info_file.

  • optimize_for_inference – 打开推理优化,如果是False则关闭所有优化选项。默认:True

  • user_info (Optional[Any]) – any type object, which will be pickled to bytes.

  • enable_metadata (bool) – whether to save metadata into output file.

Keyword Arguments:

  • enable_io16xc32 – whether to use float16 for I/O between oprs and use float32 as internal computation precision. Note the output var would be changed to float16.

  • enable_ioc16 – whether to use float16 for both I/O and computation precision.

  • enable_hwcd4 – whether to use NHWCD4 data layout. This is faster on some OpenCL backend.

  • enable_nchw88 – whether to use NCHW88 data layout, currently used in X86 AVX backend.

  • enable_nchw44 – whether to use NCHW44 data layout, currently used in arm backend.

  • enable_nchw44_dot – whether to use NCHW44_dot data layout, currently used in armv8.2+dotprod backend.

  • enable_nchw4 – whether to use NCHW4 data layout, currently used in nvidia backend(based on cudnn).

  • enable_nchw32 – whether to use NCHW32 data layout, currently used in nvidia backend with tensorcore(based on cudnn).

  • enable_chwn4 – whether to use CHWN4 data layout, currently used in nvidia backend with tensorcore.

  • enable_nchw64 – whether to use NCHW64 data layout, used for fast int4 support on Nvidia GPU.

  • enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty into one opr.

  • enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z input for inference on nvidia backend(this optimization pass will result in mismatch of the precision of output of training and inference)