megengine.functional.distributed.all_reduce_sum

all_reduce_sum(inp, group=WORLD, device=None)[源代码]

创建用于聚合通信的 all_reduce_sum 算子。

This operator sums the tensor data by coordinates across the specified group and returns a tensor with the shape of the input tensor.

参数
  • inp (Tensor) – The tensor data to apply this operator on.

  • group (Optional[Group]) – The communication node list instance of :class:’Group’ to apply this operator across. The default group is WORLD which means all processes available.

  • processes (Specify a list of process ranks to apply this operator on specific) –

  • [1 (e.g.) –

  • 3

  • 5]

  • device (Optional[str]) – The specific device type of :class:’str’ to execute this operator. The default device is None which mean the device of inp will be used.

  • devices. (Specify "cpu" or "gpu" to execute this operator on specific) –

返回

The reduce sum tensor of the input tensor data across the specified group.

返回类型

opt

Examples:

import megengine as mge
import megengine.distributed as dist
import numpy as np
from warnings import warn


def func(sum_value):
    # get the rank of this process, the ranks shold be 0, 1, 2, 3 for a 4 gpu task
    rank = dist.get_rank()
    data = mge.tensor(rank)
    # the result should be n * (n - 1) / 2 for all processes
    result = mge.functional.distributed.all_reduce_sum(data).item()
    assert result == sum_value


def main():
    p_num = mge.device.get_device_count("gpu")
    if p_num < 2:
        warn('This opr only works on group with more than one gpu')
        return
    method = dist.launcher(func)
    method(p_num * (p_num - 1) // 2)


if __name__ == '__main__':
    main()