megengine.functional.distributed.all_reduce_min¶
- all_reduce_min(inp, group=WORLD, device=None)[source]¶
Reduce tensors with min operation on each value across the specified group.
Note
inp
tensor must have identical shape in all processes across the group.- Parameters
inp (Tensor) – tensor to be reduced.
- Keyword Arguments
group (Group or sequence of ints) – the process group to work on. Default:
WORLD
.WORLD
group selects all processes available. list of process rank as parameter will create a new group to work on.device (
Tensor.device
) – the specific device to execute this operator. Default:None
None
will select the device ofinp
to execute. Specially,GPU
device can assign a different stream to execute by adding a number right after a colon following the device name while:0
denotes default stream of GPU, otherwise will use default stream.
- Return type
- Returns
A tensor with min operation on each value across the group.
The shape of the output tensor must be the same as
inp
, and the output tensor is going to be bitwise identical in all processes across the group.
Examples
>>> # We execute all_reduce_min on rank 0 and rank 1 >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([3. 4.], device=xpux:0) # Rank 1 >>> F.distributed.all_reduce_min(input, group=[0, 1]) Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([1. 2.], device=xpux:0) # Rank 1
>>> # We execute all_reduce_min with on gpu0 with cuda stream 1 >>> megengine.set_default_device("gpu0") >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=gpu0:0) # Rank 0 Tensor([3. 4.], device=gpu0:0) # Rank 1 >>> F.distributed.all_reduce_min(input, device="gpu0:1") Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([1. 2.], device=xpux:0) # Rank 1