launcher¶

class launcher(*args, **kwargs)[source]¶

Decorator for launching multiple processes in single-machine/multi-machine multi-gpu training.

Parameters

func (Callable) – the function you want to launch in distributed mode.
n_gpus (Number) – how many devices each node. If n_gpus is None, n_gpus will be the device count of current node. Default: None.
world_size (Number) – how many devices totally. If world_size is None, world_size will be n_gpus. Default: None.
rank_start (Number) – the start rank number in current node. For single-machine multi-gpu training, rank_start should be 0. For multi-machine training, rank_start of Machine i should be i * n_gpus. Default: 0
master_ip (str) – ip address for master node (where the rank 0 is placed). Default: “localhost”.
port (Number) – server port for distributed server. Default: 0.
backend (str)) – set default collective communication backend. backend should be “nccl” or “rccl”. Default: “nccl”.