launcher¶
- class launcher(*args, **kwargs)[source]¶
Decorator for launching multiple processes in single-machine/multi-machine multi-gpu training.
- Parameters
func (Callable) – the function you want to launch in distributed mode.
n_gpus (Number) – how many devices each node. If
n_gpus
is None,n_gpus
will be the device count of current node. Default: None.world_size (Number) – how many devices totally. If
world_size
is None,world_size
will ben_gpus
. Default: None.rank_start (Number) – the start rank number in current node. For single-machine multi-gpu training, rank_start should be
0
. For multi-machine training,rank_start
of Machinei
should bei * n_gpus
. Default: 0master_ip (str) – ip address for master node (where the rank 0 is placed). Default: “localhost”.
port (Number) – server port for distributed server. Default: 0.
backend (str)) – set default collective communication backend.
backend
should be “nccl” or “rccl”. Default: “nccl”.
See also
Examples of distributed training using
launcher
decorator can be found in _distributed-guide