Defined in File cg.h
This struct is a nested type of Class ComputingGraph.
get attribute for an operator
graph optimization level: 0: disable 1: level-1: inplace arith transformations during graph construction 2: level-2: level-1, plus global optimization before graph compiling 3: also enable JIT <0: corresponding level, with result check for debug
max size of allreduce packs in MB set this option to zero to disable PackAllReducePass
do not pack the first n allreduces PackAllReducePass disabled if allreduce_pack_max_size is zero
set logging level, larger number means more verbose 0: no log info 1: static memory allocation status WorkspaceLimitGetter summary optimizer summary
optimizer var replace details during graph compiling duplicated operator
async exec: dispatch on separate threads for different comp_node 0: do not perform async dispatch 1: dispatch async if there are more than one comp node with limited queue mask 0b10: async if there are multiple comp nodes with mask 0b100: always async
force dynamic memory alloc for all vars
whether to perform var sanity check on first run
whether to allocate static memory just after compiling graph
whether only to perform non-computing tasks (like memory allocation and queue initialization) for next exec. This would be reset to false when the graph is executed.
whether to enable sublinear memory optimization
do not re-profile to select best impl algo when input shape changes (use previous algo)
whether to perform defragmenting when memory allocation for a dynamic var fails
whether to reshape grad var whose wrt shape is statically inferrable but its own shape is dynamic
whether to enable swap memory as swap’s performance is greatly worse than sublinear, it is recommended to use sublinear first
whether to use CompNodeSeqRecorder to record the execution sequence and directly replay it for later executions.
Level 1 is mainly used to speed up execution (especially for opencl); level 2 is used for reducing memory usage.
Level 1 constraints:
All vars must be statically allocated
Host input/output buffer pointers can not be changed if shape is not changed (this is not checked in execution for efficiency considerations; this is potentially dangerous)
Synchronization can only occur at the end of execution
Not all comp node implementations support recording computing sequence
Only one comp node can be used in the graph
Level 2: besides recording the computing sequence, the dependencies are also moved into the compiled func (see GraphExecutable::ExecDependency). Additional constraints:
Shapes can not change
both fake_next_exec and var_sanity_check_first_run must be disabled
Var shapes must be correctly setup before calling compile()
whether to evaulate var node values as they are inserted
Request that operators should not force update their inputs.
THIS FLAG IS RESERVED FOR INTERNAL USE
When this flag is set, operators like AddUpdate and BatchNorm will still attempt to inplace update their inputs, but failing to do so will not be considered as an error.
add extra deps for the comp seq if a specific var is dependent
contains any user data associated with this graph
graph optimization options
whether to enable JIT; JIT would also be enabled at O3 this value indicates JIT level: 1 for basic elemwise opr; 2 for including reduce oprs
whether to enable fine-grained TensorRT opr replace
attribute for a specific operator
sequence compile optimization options
whether to enable memory forwarding to optimize mem plans
whether to enable static memory reuse (i.e. using optimized static memory allocation algorithm)
whether to enable comp node optimization (e.g. using copy stream for I/O operators)
Control parameter for sublinear memory optimization.