megengine.amp.GradScaler

class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[源代码]

A helper class that performs grad scaling to prevent from data overflow in autocast mode.

参数
  • init_scale (float) – Initial scale factor.

  • growth_factor (float) – Factor that the scale is multiplied by in actual update stage. If growth_factor is 0, scale_factor will not update.

  • backoff_factor (float) – Factor that the scale is multiplied by when encountering overflow grad.

  • growth_interval (int) – The interval between two scale update stages.

示例

gm = GradManager()
opt = ...
scaler = GradScaler()

gm.attach(model.parameters())

@autocast()
def train_step(image, label):
    with gm:
        logits = model(image)
        loss = F.nn.cross_entropy(logits, label)
        scaler.backward(gm, loss)
    opt.step().clear_grad()
    return loss

If need more flexible usage, could split scaler.backward into three lines:

@autocast()
def train_step(image, label):
    with gm:
        logits = model(image)
        loss = F.nn.cross_entropy(logits, label)
        gm.backward(loss, dy=megengine.tensor(scaler.scale_factor))
    scaler.unscale(gm.attached_tensors())
    scaler.update()
    opt.step().clear_grad()
    return loss

This is useful when need to accumulate grads for multi batches.

Methods

backward(gm[, y, dy, unscale_grad, update_scale])

A wrapper of GradManager's backward, used to scale y's grad and unscale parameters' grads.

load_state_dict(state)

state_dict()

unscale(grad_tensors)

Unscale all grad_tensors's grad.

update([new_scale])

Update the scale factor according to whether encountered overflow grad.