megengine.quantization¶
Note
import megengine.quantization as Q
model = ... # The pre-trained float model that needs to be quantified
Q.quantize_qat(model, qconfig=...) #
for _ in range(...):
train(model)
Q.quantize(model)
具体用法说明请参考用户指南页面 —— Quantization 。
量化配置 QConfig¶
A config class indicating how to do quantize toward |
可用预设配置¶
min_max_fakequant_qconfig
使用
MinMaxObserver
和FakeQuantize
预设。ema_fakequant_qconfig
sync_ema_fakequant_qconfig
ema_lowbit_fakequant_qconfig
使用
ExponentialMovingAverageObserver
和FakeQuantize
且数值类型为qint4
的预设。calibration_qconfig
对激活值使用
HistogramObserver
进行后量化(无FakeQuantize
)的预设。tqt_qconfig
使用
TQT
进行假量化的预设。passive_qconfig
使用
PassiveObserver
和FakeQuantize
的预设。easyquant_qconfig
用于 easyquant 算法的 QConfig,等价于
passive_qconfig
.
Observer¶
A base class for Observer Module. |
|
A Observer Module records input tensor's running min and max values to calc scale. |
|
A distributed version of |
|
A |
|
A distributed version of |
|
A |
|
An Observer that supports setting |
FakeQuantize¶
A module to do quant and dequant according to observer's scale and zero_point. |
|
TQT: https://arxiv.org/abs/1903.08066 Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks. |
|
LSQ: https://arxiv.org/pdf/1902.08153.pdf Estimating and scaling the task loss gradient at each weight and activation layer's quantizer step size |
量化操作¶
Recursively convert float |
|
Recursively convert |
|
Implementation of |
|
Recursively enable |
|
Recursively disable |
|
Recursively enable |
|
Recursively disable |
|
Recursively set |
|
Reset |
Utils¶
To standardize FakeQuant, Observer and Tensor's qparams format. |
|
Quantization mode enumerate class. |
|
|
Apply fake quantization to bias, with the special scale from input tensor and weight tensor, the quantized type set to qint32 also. |
|
Apply fake quantization to the inp tensor. |