GradScaler#
- class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[源代码]#
在
autocast
模式下执行梯度缩放以防止数据溢出的helper类。- 参数:
init_scale (
float
) – Initial scale factor.growth_factor (
float
) – Factor that the scale is multiplied by in actualupdate
stage. If growth_factor is 0, scale_factor will not update.backoff_factor (
float
) – Factor that the scale is multiplied by when encountering overflow grad.growth_interval (
int
) – The interval between two scale update stages.
示例
gm = GradManager() opt = ... scaler = GradScaler() gm.attach(model.parameters()) @autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) scaler.backward(gm, loss) opt.step().clear_grad() return loss
如果需要更灵活地使用,可以拆分
scaler.backward
成三行:@autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) gm.backward(loss, dy=megengine.tensor(scaler.scale_factor)) scaler.unscale(gm.attached_tensors()) scaler.update() opt.step().clear_grad() return loss
可用于为多个 batch 积累梯度。
- backward(gm, y=None, dy=None, *, unscale_grad=True, update_scale='if_unscale_grad')[源代码]#
backward
的包装器,用于缩放 y 的梯度和反缩放参数的梯度。