Saving memory by recomputing (Recomputation)

Generally speaking, using a larger model and a larger batch size can achieve better training results, but with it comes a larger memory footprint.

Recomputation is essentially a strategy of exchanging time for space. It can be compared to a Tensor Cache strategy. When the video memory space is insufficient, you can choose to clear the results of some forward calculations; When these calculation results are used, they are recalculated based on the previously cached checkpoint (Checkpoint). Refer to the following diagram, blue is the occupied video memory ( Image source ):

Vanilla backprop

../../_images/vanilla-backprop.gif

Checkpointed backprop

../../_images/checkpointed-backprop.gif

MegEngine applies the classic recalculation strategy to the project implementation. For details, please refer to the following page: