megengine.functional.distributed.all_reduce_min¶
- all_reduce_min(inp, group=WORLD, device=None)[source]¶
Reduce tensors with min operation on each value across the specified group.
Note
inptensor must have identical shape in all processes across the group.- Parameters
inp (Tensor) – tensor to be reduced.
- Keyword Arguments
group (Group or sequence of ints) – the process group to work on. Default:
WORLD.WORLDgroup selects all processes available. list of process rank as parameter will create a new group to work on.device (
Tensor.device) – the specific device to execute this operator. Default:NoneNonewill select the device ofinpto execute. Specially,GPUdevice can assign a different stream to execute by adding a number right after a colon following the device name while:0denotes default stream of GPU, otherwise will use default stream.
- Return type
- Returns
A tensor with min operation on each value across the group.
The shape of the output tensor must be the same as
inp, and the output tensor is going to be bitwise identical in all processes across the group.
Examples
>>> # We execute all_reduce_min on rank 0 and rank 1 >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([3. 4.], device=xpux:0) # Rank 1 >>> F.distributed.all_reduce_min(input, group=[0, 1]) Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([1. 2.], device=xpux:0) # Rank 1
>>> # We execute all_reduce_min with on gpu0 with cuda stream 1 >>> megengine.set_default_device("gpu0") >>> input = F.arange(2) + 1 + 2 * rank >>> input Tensor([1. 2.], device=gpu0:0) # Rank 0 Tensor([3. 4.], device=gpu0:0) # Rank 1 >>> F.distributed.all_reduce_min(input, device="gpu0:1") Tensor([1. 2.], device=xpux:0) # Rank 0 Tensor([1. 2.], device=xpux:0) # Rank 1