megengine.functional.distributed.all_reduce_max¶
- all_reduce_max(inp, group=WORLD, device=None)[source]¶
Reduce tensors with max operation on each value across the specified group.
Note
inp
tensor must have identical shape in all processes across the group.- Parameters
inp (Tensor) – tensor to be reduced.
- Keyword Arguments
group (Group or sequence of ints) – the process group to work on. Default:
WORLD
.WORLD
group selects all processes available. list of process rank as parameter will create a new group to work on.device (
Tensor.device
) – the specific device to execute this operator. Default:None
None
will select the device ofinp
to execute. Specially,GPU
device can assign a different stream to execute by adding a number right after a colon following the device name while:0
denotes default stream of GPU, otherwise will use default stream.
- Return type
- Returns
A tensor with max operation on each value across the group.
The shape of the output tensor must be the same as
inp
, and the output tensor is going to be bitwise identical in all processes across the group.
Examples
>>> # We execute all_reduce_max on rank 0 and rank 1 >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([3. 4.], device=xpux:0) # Rank 1 >>> F.distributed.all_reduce_max(input, group=[0, 1]) Tensor([3. 4.], device=xpux:0) # Rank 0 Tensor([3. 4.], device=xpux:0) # Rank 1
>>> # We execute all_reduce_max with on gpu0 with cuda stream 1 >>> megengine.set_default_device("gpu0") >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=gpu0:0) # Rank 0 Tensor([3. 4.], device=gpu0:0) # Rank 1 >>> F.distributed.all_reduce_max(input, device="gpu0:1") Tensor([3. 4.], device=xpux:0) # Rank 0 Tensor([3. 4.], device=xpux:0) # Rank 1