megengine.distributed¶
>>> import megengine.distributed as dist
Get or set backend of collective communication. |
分组(Group)¶
Distributed Server for distributed training. |
|
Include ranked nodes running collective communication (See |
|
Initialize the distributed process group and specify the device used in the current process |
|
Build a subgroup containing certain ranks. |
|
Block until all ranks in the group reach this barrier. |
|
Override distributed backend |
|
Return True if the distributed process group has been initialized. |
|
Get the backend str. |
|
Get client of python XML RPC server. |
|
Get master_ip and port of C++ mm_server. |
|
Get master_ip and port of python XML RPC server. |
|
Get the rank of the current process. |
|
Get the total number of processes participating in the job. |
运行器(Launcher)¶
Decorator for launching multiple processes in single-machine/multi-machine multi-gpu training. |
辅助功能(Helper)¶
Broadcast tensors between given group. |
|
Decorator. |
|
Allreduce Callback with tensor fusion optimization. |
|
Returns split tensor to list of tensors as offsets and shapes described, only used for |
|
Returns concated tensor, only used for |
|