megengine.jit package


class megengine.jit.sublinear_memory_config.SublinearMemoryConfig(thresh_nr_try=10, genetic_nr_iter=0, genetic_pool_size=20, lb_memory=0, num_worker=2)[source]

Bases: object

Configuration for sublinear memory optimization.

  • thresh_nr_try (int) – number of samples both for searching in linear space and around current thresh in sublinear memory optimization. Default: 10. It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_THRESH_NR_TRY’.

  • genetic_nr_iter (int) – number of iterations to find the best checkpoints in genetic algorithm. Default: 0. It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_GENETIC_NR_ITER’.

  • genetic_pool_size (int) – number of samples for the crossover random selection during genetic optimization. Default: 20. It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_GENETIC_POOL_SIZE’.

  • lb_memory (int) – memory lower bound of bottleneck size in MB for sublinear memory optimization. It can be used to perform manual tradeoff between memory and speed. Default: 0. It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_LOWER_BOUND_MB’.

  • num_worker (int) – number of thread workers to search the optimum checkpoints in sublinear memory optimization. Default: half of cpu number in the system. Note: the value must be greater or equal to one. It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_WORKERS’.

Note that the environmental variable MGB_COMP_GRAPH_OPT must be set to ‘enable_sublinear_memory_opt=1’ in order for the above environmental variable to be effective.


class megengine.jit.tracing.CompiledTensorProxy(handle)[source]

Bases: object

Duck-typed RawTensor

property device
property dtype
property shape
class megengine.jit.tracing.TensorInfo[source]

Bases: object

__slots__ = ('external', 'data_read', 'shape_read', 'value_read', 'exported', 'device', 'dtype', 'shape', 'is_const', 'bound_data', 'varnode', 'data_setter', 'shape_reader', 'value_reader', 'data_reader')
exception megengine.jit.tracing.TraceMismatchError[source]

Bases: RuntimeError

megengine.jit.tracing.apply_compiled_mode(op, *args)[source]
megengine.jit.tracing.apply_const_compiled_mode(value, dtype, device, is_const, no_cache)[source]
megengine.jit.tracing.apply_const_symbolic_mode(value, dtype, device)[source]
megengine.jit.tracing.apply_const_with_tracing(value, dtype, device, is_const, no_cache)[source]
megengine.jit.tracing.apply_symbolic_mode(op, *args)[source]
megengine.jit.tracing.apply_with_tracing(op, *args)[source]
megengine.jit.tracing.assign_raw_tensor(lhs, rhs)[source]
class megengine.jit.tracing.trace(function, symbolic=False, capture_as_const=False, sublinear_memory_config=None, profiling=False, opt_level=None, symbolic_shape=True)[source]

Bases: object

Wraps a callable and provide:

  • tracing via trace() and dump()

  • accelerated evalutaion via __call__()

  • function – the function will be traced.

  • symbolic – whether to apply symbolic execution for tracing. Default: False

  • capture_as_const – capture global vars or closures as const value. Default: False

  • sublinear_memory_config (Optional[SublinearMemoryConfig]) – configuration for sublinear memory optimization. If not None, it enables sublinear memory optimization with given setting.

  • profiling (bool) – whether to profile compiled trace. Default: False

  • opt_level (Optional[int]) – optimization level for compiling trace.

  • symbolic_shape (bool) – whether to use symbolic shape for tracing. Default: True

dump(file, *, arg_names=None, output_names=None, append=False, optimize_for_inference=True, **kwargs)[source]

Serializes trace to file system.

  • file – output file, could be file object or filename.

  • arg_names – names of the input tensors in the traced function.

  • output_names – names of the output tensors in the traced function, use the default name if not specified.

  • append – whether output is appended to file. Only works when file is str.

  • optimize_for_inference – enbale optmizations, will skip all optimize options if this is False. Default: True

Keyword Arguments
  • enable_io16xc32 –

    whether to use float16 for I/O between oprs and use float32 as internal computation precision. Note the output var would be changed to float16.

  • enable_ioc16 –

    whether to use float16 for both I/O and computation precision.

  • enable_hwcd4 –

    whether to use NHWCD4 data layout. This is faster on some OpenCL backend.

  • enable_nchw88 –

    whether to use NCHW88 data layout, currently used in X86 AVX backend.

  • enable_nchw44 –

    whether to use NCHW44 data layout, currently used in arm backend.

  • enable_nchw44_dot –

    whether to use NCHW44_dot data layout, currently used in armv8.2+dotprod backend.

  • enable_nchw4 –

    whether to use NCHW4 data layout, currently used in nvidia backend(based on cudnn).

  • enable_nchw32 –

    whether to use NCHW32 data layout, currently used in nvidia backend with tensorcore(based on cudnn).

  • enable_chwn4 –

    whether to use CHWN4 data layout, currently used in nvidia backend with tensorcore.

  • enable_fuse_conv_bias_nonlinearity: whether to fuse conv+bias+nonlinearty

    into one opr.

  • enable_fuse_conv_bias_with_z: whether to fuse conv_bias with z

    input for inference on nvidia backend(this optimization pass will result in mismatch of the precision of output of training and inference)


Get profiling result for compiled trace.


a json compatible object.

trace(*args, **kwargs)[source]