Saving memory by recomputing (Recomputation)#
Generally speaking, using a larger model and a larger batch size can achieve better training results, but with it comes a larger memory footprint.
Recomputation is essentially a strategy of exchanging time for space. It can be compared to a Tensor Cache strategy. When the video memory space is insufficient, you can choose to clear the results of some forward calculations; When these calculation results are used, they are recalculated based on the previously cached checkpoint (Checkpoint). Refer to the following diagram, blue is the occupied video memory ( Image source ):
Vanilla backprop
Checkpointed backprop
MegEngine applies the classic recalculation strategy to the project implementation. For details, please refer to the following page: