QConfig¶
- class QConfig(weight_observer, act_observer, weight_fake_quant, act_fake_quant)[source]¶
A config class indicating how to do quantize toward
QATModule‘sactivationandweight. Seeset_qconfigfor detail usage.- Parameters
weight_observer – interface to instantiate an
Observerindicating how to collect scales and zero_point of wegiht.act_observer – similar to
weight_observerbut toward activation.weight_fake_quant – interface to instantiate a
FakeQuantizeindicating how to do fake_quant calculation.act_observer – similar to
weight_fake_quantbut toward activation.
Examples
# Default EMA QConfig for QAT. ema_fakequant_qconfig = QConfig( weight_observer=partial(MinMaxObserver, dtype="qint8_narrow"), act_observer=partial(ExponentialMovingAverageObserver, dtype="qint8"), weight_fake_quant=partial(FakeQuantize, dtype="qint8_narrow"), act_fake_quant=partial(FakeQuantize, dtype="qint8"), )
Each parameter is a
classrather than an instance. And we recommand usingfunctools.partialto add initialization parameters of theclass, so that don’t need to provide parameters inset_qconfig.Usually we choose narrow version dtype (like
qint8_narrow) for weight related paramters and normal version for activation related ones. For the result of multiplication and addition asa * b + c * d, if four variables are all -128 of dtypeqint8, then the result will be2^15and cause overflow. Weights are commonly calculated in this way, so need to narrow qmin to -127.