megengine.amp.GradScaler¶
- class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[源代码]¶
A helper class that performs grad scaling to prevent from data overflow in
autocast
mode.- 参数
init_scale (
float
) – Initial scale factor.growth_factor (
float
) – Factor that the scale is multiplied by in actualupdate
stage. If growth_factor is 0, scale_factor will not update.backoff_factor (
float
) – Factor that the scale is multiplied by when encountering overflow grad.growth_interval (
int
) – The interval between two scale update stages.
示例
gm = GradManager() opt = ... scaler = GradScaler() gm.attach(model.parameters()) @autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) scaler.backward(gm, loss) opt.step().clear_grad() return loss
If need more flexible usage, could split
scaler.backward
into three lines:@autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) gm.backward(loss, dy=megengine.tensor(scaler.scale_factor)) scaler.unscale(gm.attached_tensors()) scaler.update() opt.step().clear_grad() return loss
This is useful when need to accumulate grads for multi batches.
Methods
backward
(gm[, y, dy, unscale_grad, update_scale])A wrapper of GradManager's
backward
, used to scaley
's grad and unscale parameters' grads.load_state_dict
(state)unscale
(grad_tensors)Unscale all
grad_tensors
's grad.update
([new_scale])Update the scale factor according to whether encountered overflow grad.