Memory optimization using DTR¶
MegEngine optimizes video memory in dynamic graphs by introducing DTR 1 technology, and also supports opening in static graphs.
- 1
Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, and Zachary Tatlock. Dynamic tensor rematerialization. In International Conference on Learning Representations. 2021. URL: https://openreview.net/forum?id=Vfs_2RnOD0H.
How to use and configure DTR¶
Add a line of code before the training code to enable DTR memory optimization for dynamic:
>>> megengine.dtr.enable()
New in version 1.5: Users can now directly enable DTR optimization without setting a memory threshold: py:data:~.dtr.eviction_threshold as a trigger condition. By default, MegEngine will try to optimize when the currently idle video memory cannot meet an application, find the optimal Tensor according to the DTR strategy and release its video memory until the video memory application is successful.
In version 1.4, the video memory threshold must be set in advance to enable DTR video memory optimization:
>>> megengine.dtr.eviction_threshold = "5GB"
>>> megengine.dtr.enable()
Video memory threshold setting skills
In general, the smaller the memory threshold is set, the lower the peak memory value and the longer the training time; the larger the memory threshold is, the higher the memory peak value and the shorter the training time. But it is worth noting that when the memory threshold is close to the capacity of the graphics card, it is easy to cause fragmentation problems. Because DTR performs the release operation according to the active video memory size, the physical addresses of the released Tensors on the graphics card are likely to be discontinuous. For example,:releases two 100MB Tensors whose physical locations are not adjacent, but still cannot satisfy a 200MB video memory application. At this point, the defragmentation operation is automatically triggered, which has a huge impact on performance.
Combined with distributed training
In distributed scenarios, we usually use launcher
to wrap a function into a multi-process running function. At this time, if you want to enable DTR memory optimization, you need to define DTR in the wrapped function. parameter:
@dist.launcher
def main():
megengine.dtr.enable()
If you are not clear about the related concepts, you can refer to the Distributed Training page for details.
See also
There are some other interfaces such as evictee_minimum_size
, enable_sqrt_sampling
… You can customize the DTR strategy, for more configuration instructions, please refer to: API documentation page for the py:mod:~.dtr module.
Enable DTR in static images¶
Users can use DTRConfig
to set the parameter `` :class:of ~.jit.trace when compiling static images, you can turn on DTR optimization:
from megengine.jit import trace, DTRConfig
config = DTRConfig(eviction_threshold=8*1024**3)
@trace(symbolic=True, dtr_config=config)
def train_func(data, label, * , net, optimizer, gm):
...