QConfig¶
- class QConfig(weight_observer, act_observer, weight_fake_quant, act_fake_quant)[source]¶
A config class indicating how to do quantize toward
QATModule
‘sactivation
andweight
. Seeset_qconfig
for detail usage.- Parameters
weight_observer – interface to instantiate an
Observer
indicating how to collect scales and zero_point of wegiht.act_observer – similar to
weight_observer
but toward activation.weight_fake_quant – interface to instantiate a
FakeQuantize
indicating how to do fake_quant calculation.act_observer – similar to
weight_fake_quant
but toward activation.
Examples
# Default EMA QConfig for QAT. ema_fakequant_qconfig = QConfig( weight_observer=partial(MinMaxObserver, dtype="qint8_narrow"), act_observer=partial(ExponentialMovingAverageObserver, dtype="qint8"), weight_fake_quant=partial(FakeQuantize, dtype="qint8_narrow"), act_fake_quant=partial(FakeQuantize, dtype="qint8"), )
Each parameter is a
class
rather than an instance. And we recommand usingfunctools.partial
to add initialization parameters of theclass
, so that don’t need to provide parameters inset_qconfig
.Usually we choose narrow version dtype (like
qint8_narrow
) for weight related paramters and normal version for activation related ones. For the result of multiplication and addition asa * b + c * d
, if four variables are all -128 of dtypeqint8
, then the result will be2^15
and cause overflow. Weights are commonly calculated in this way, so need to narrow qmin to -127.