GradScaler¶
- class GradScaler(init_scale=2.0 ** 4, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000)[source]¶
A helper class that performs grad scaling to prevent from data overflow in
autocast
mode.- Parameters
init_scale (
float
) – initial scale factor.growth_factor (
float
) – factor that the scale is multiplied by in actualupdate
stage. If growth_factor is 0, scale_factor will not update.backoff_factor (
float
) – factor that the scale is multiplied by when encountering overflow grad.growth_interval (
int
) – the interval between two scale update stages.
- Returns
gradScaler object.
Example
gm = GradManager() opt = ... scaler = GradScaler() gm.attach(model.parameters()) @autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) scaler.backward(gm, loss) opt.step().clear_grad() return loss
If need more flexible usage, could split
scaler.backward
into three lines:@autocast() def train_step(image, label): with gm: logits = model(image) loss = F.nn.cross_entropy(logits, label) gm.backward(loss, dy=megengine.tensor(scaler.scale_factor)) scaler.unscale(gm.attached_tensors()) scaler.update() opt.step().clear_grad() return loss
This is useful when need to accumulate grads for multi batches.
- backward(gm, y=None, dy=None, *, unscale_grad=True, update_scale='if_unscale_grad')[source]¶
A wrapper of GradManager’s
backward
, used to scaley
’s grad and unscale parameters’ grads.- Parameters
gm (
GradManager
) – The to be wrapped GradManager.y (
Union
[Tensor
,List
[Tensor
],None
]) – Same as GradManager backward’sy
.dy (
Union
[Tensor
,List
[Tensor
],None
]) – Same as GradManager backward’sdy
. Will be multiplied byscale_factor
.unscale_grad (
bool
) – Whether dounscale
at the same time. Could beFalse
if needs to accumulate grads.update_scale (
bool
) – Same asunscale
’supdate
. Will be ignored ifunscale_grad
isFalse
.