Saving memory by recomputing (Recomputation)¶

Generally speaking, using a larger model and a larger batch size can achieve better training results, but with it comes a larger memory footprint.

Recomputation is essentially a strategy of exchanging time for space. It can be compared to a Tensor Cache strategy. When the video memory space is insufficient, you can choose to clear the results of some forward calculations; When these calculation results are used, they are recalculated based on the previously cached checkpoint (Checkpoint). Refer to the following diagram, blue is the occupied video memory ( Image source )：

Vanilla backprop

Checkpointed backprop

MegEngine applies the classic recalculation strategy to the project implementation. For details, please refer to the following page：

Memory optimization using DTR
使用 Sublinear 进行显存优化

Use Hub to publish and load pre-trained models

Memory optimization using DTR