Configuration for sublinear memory optimization.
thresh_nr_try (int) – number of samples both for searching in linear space
and around current thresh in sublinear memory optimization. Default: 10.
It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_THRESH_NR_TRY’.
genetic_nr_iter (int) – number of iterations to find the best checkpoints in genetic algorithm.
It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_GENETIC_NR_ITER’.
genetic_pool_size (int) – number of samples for the crossover random selection
during genetic optimization. Default: 20.
It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_GENETIC_POOL_SIZE’.
lb_memory (int) – memory lower bound of bottleneck size in MB for sublinear memory optimization.
It can be used to perform manual tradeoff between memory and speed. Default: 0.
It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_LOWER_BOUND_MB’.
num_worker (int) – number of thread workers to search the optimum checkpoints
in sublinear memory optimization. Default: half of cpu number in the system.
Note: the value must be greater or equal to one.
It can also be set through the environmental variable ‘MGB_SUBLINEAR_MEMORY_WORKERS’.
Note that the environmental variable MGB_COMP_GRAPH_OPT must be set to ‘enable_sublinear_memory_opt=1’
in order for the above environmental variable to be effective.
Wraps a callable and provide:
tracing via trace() and dump()
accelerated evalutaion via __call__()
function – the function will be traced.
symbolic – whether to apply symbolic execution for tracing. Default: False
capture_as_const – capture global vars or closures as const value. Default: False
sublinear_memory_config (Optional[SublinearMemoryConfig]) – configuration for sublinear memory optimization.
If not None, it enables sublinear memory optimization with given setting.
profiling (bool) – whether to profile compiled trace. Default: False
opt_level (Optional[int]) – optimization level for compiling trace.
symbolic_shape (bool) – whether to use symbolic shape for tracing. Default: True
Serializes trace to file system.
file – output file, could be file object or filename.
arg_names – names of the input tensors in the traced function.
output_names – names of the output tensors in the traced function,
use the default name if not specified.
append – whether output is appended to file.
Only works when file is str.
optimize_for_inference – enbale optmizations,
will skip all optimize options if this is False. Default: True
whether to use float16 for I/O between oprs and use
float32 as internal computation precision. Note the output var would be
changed to float16.
whether to use float16 for both I/O and computation
whether to use NHWCD4 data layout. This is faster on some
whether to use NCHW88 data layout, currently
used in X86 AVX backend.
whether to use NCHW44 data layout, currently
used in arm backend.
whether to use NCHW44_dot data layout, currently
used in armv8.2+dotprod backend.
whether to use NCHW4 data layout, currently
used in nvidia backend(based on cudnn).
whether to use NCHW32 data layout, currently
used in nvidia backend with tensorcore(based on cudnn).
whether to use CHWN4 data layout, currently
used in nvidia backend with tensorcore.
into one opr.
input for inference on nvidia backend(this optimization pass will
result in mismatch of the precision of output of training and
Get profiling result for compiled trace.
a json compatible object.