megengine.functional.distributed.all_reduce_sum¶

all_reduce_sum(inp, group=WORLD, device=None)[源代码]¶

Reduce tensors across the specified group by sum.

参数

inp (Tensor) – Input tensor.
group (Optional[Group]) – The process group to work on. The default group is WORLD which means all processes available. You can use a list of process ranks to create new group to work on it, e.g. [1, 3, 5].
device (Optional[str]) – The specific device to execute this operator. None default device means the device of inp will be used. Specify “gpu0:1” to execute this operator on diffrent cuda stream, 1 is stream id, and default stream id is 0.

返回类型

Tensor

返回

Result tensor.

实际案例

input = Tensor(rank)
# Rank 0 # input: Tensor(0)
# Rank 1 # input: Tensor(1)
output = all_reduce_sum(input)
# Rank 0 # output: Tensor(1)
# Rank 1 # output: Tensor(1)

megengine.functional.distributed.all_reduce_min

megengine.functional.distributed.broadcast