megengine.quantization package

megengine.quantization.fake_quant

class megengine.quantization.fake_quant.FakeQuantize(dtype, narrow_range=False, enable=True, **kwargs)[source]

Bases: megengine.quantization.fake_quant._FakeQuantize

A module to do quant and dequant according to observer’s scale and zero_point.

fake_quant_forward(inp, q_dict=None)[source]
class megengine.quantization.fake_quant.TQT(q_dict, dtype, narrow_range=False, enable=True, **kwargs)[source]

Bases: megengine.quantization.fake_quant._FakeQuantize

TQT: https://arxiv.org/abs/1903.08066 Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks.

fake_quant_forward(inp, q_dict=None)[source]
get_dtype()[source]
get_qparams()[source]

megengine.quantization.internal_fake_quant

megengine.quantization.observer

class megengine.quantization.observer.ExponentialMovingAverageObserver(momentum=0.9, mode=<QuantMode.SYMMERTIC: 1>, eps=1e-05, dtype='qint8', narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.MinMaxObserver

forward(x_orig)[source]
set_momentum(momentum)[source]
class megengine.quantization.observer.HistogramObserver(bins=2048, upsample_rate=128, mode=<QuantMode.SYMMERTIC: 1>, eps=1e-05, dtype='qint8', narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.MinMaxObserver

forward(x_orig)[source]
get_qparams()[source]
sideeffect_forward(x_orig)[source]
class megengine.quantization.observer.MinMaxObserver(mode=<QuantMode.SYMMERTIC: 1>, eps=1e-05, dtype='qint8', narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.Observer

forward(x_orig)[source]
get_qparams()[source]
class megengine.quantization.observer.Observer(dtype, narrow_range=False, **kwargs)[source]

Bases: megengine.module.module.Module

A base class for Observer Module.

Parameters
  • dtype (str) – a string indicating to collect scale and zero_point of which dtype.

  • narrow_range (bool) – whether the absolute value of qmin is the same as qmax, instead of 1 greater. Usually True for weight and False for activation.

disable()[source]
enable()[source]
abstract forward(x)[source]
get_dtype()[source]
abstract get_qparams(**kwargs)[source]
train(mode=True, recursive=True)[source]

Sets training mode of all the modules within this module (including itself) to mode. This effectively sets the training attributes of those modules to mode, but only has effect on certain modules (e.g. BatchNorm2d, Dropout, Observer)

Parameters
  • mode (bool) – the training mode to be set on modules.

  • recursive (bool) – whether to recursively call submodules’ train().

Return type

None

class megengine.quantization.observer.PassiveObserver(q_dict, dtype, narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.Observer

This class can be set scale derectly.

forward(x)[source]

Just return input because q_dict is set by apply_easy_quant().

get_qparams()[source]
property scale
class megengine.quantization.observer.SyncExponentialMovingAverageObserver(momentum=0.9, mode=<QuantMode.SYMMERTIC: 1>, eps=1e-05, dtype='qint8', narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.ExponentialMovingAverageObserver

forward(x_orig)[source]
class megengine.quantization.observer.SyncMinMaxObserver(mode=<QuantMode.SYMMERTIC: 1>, eps=1e-05, dtype='qint8', narrow_range=False, **kwargs)[source]

Bases: megengine.quantization.observer.MinMaxObserver

forward(x_orig)[source]

megengine.quantization.qconfig

class megengine.quantization.qconfig.QConfig(weight_observer, act_observer, weight_fake_quant, act_fake_quant)[source]

Bases: object

A config class indicating how to do quantize toward QATModule’s activation and weight. See set_qconfig() for detail usage.

Parameters
  • weight_observer – interface to instantiate an Observer indicating how to collect scales and zero_point of wegiht.

  • act_observer – similar to weight_observer but toward activation.

  • weight_fake_quant – interface to instantiate a FakeQuantize indicating how to do fake_quant calculation.

  • act_observer – similar to weight_fake_quant but toward activation.

Examples:

# Default EMA QConfig for QAT.
ema_fakequant_qconfig = QConfig(
    weight_observer=partial(MinMaxObserver, dtype="qint8", narrow_range=True),
    act_observer=partial(ExponentialMovingAverageObserver, dtype="qint8", narrow_range=False),
    weight_fake_quant=partial(FakeQuantize, dtype="qint8", narrow_range=True),
    act_fake_quant=partial(FakeQuantize, dtype="qint8", narrow_range=False),
)

Each parameter is a class rather than an instance. And we recommand using functools.partial to add initialization parameters of the class, so that don’t need to provide parameters in set_qconfig().

Usually we set narrow_range of weight related paramters to True and of activation related parameters to False. For the result of multiplication and addition as a * b + c * d, if four variables are all -128 of dtype qint8, then the result will be 2^15 and cause overflow. Weights are commonly calculated in this way, so needed to narrow the range.

megengine.quantization.quantize

megengine.quantization.quantize.apply_easy_quant(module, data, start=0.8, stop=1.2, num=40)[source]

Implementation of EasyQuant: https://arxiv.org/pdf/2006.16669. Search for optimal scales.

Parameters
  • module – root module.

  • data – input tensor used to search optimal scale.

  • start – lower bound of the search interval.

  • stop – upper bound of the search interval.

  • num – number of samples to search.

megengine.quantization.quantize.disable_fake_quant(module)[source]

Recursively disable module fake quantization in QATModule through apply()

Parameters

module (Module) – root module to do disable fake quantization recursively.

megengine.quantization.quantize.disable_observer(module)[source]

Recursively disable module observer in QATModule through apply()

Parameters

module (Module) – root module to do disable observer recursively.

megengine.quantization.quantize.enable_fake_quant(module)[source]

Recursively enable module fake quantization in QATModule through apply()

Parameters

module (Module) – root module to do enable fake quantization recursively.

megengine.quantization.quantize.enable_observer(module)[source]

Recursively enable module observer in QATModule through apply()

Parameters

module (Module) – root module to do enable observer recursively.

megengine.quantization.quantize.hook_qat_module(module, func)[source]

Add hooks for all QATModule submodule

megengine.quantization.quantize.is_qat(mod)[source]
megengine.quantization.quantize.propagate_qconfig(module, qconfig)[source]

Recursively set module’s qconfig through apply().

Parameters
  • module (QATModule) – root module to traverse recursively.

  • qconfig (QConfig) – a instance of QConfig to be set as submodules’ qconfig.

megengine.quantization.quantize.quantize(module, inplace=True, mapping=None)[source]

Recursively convert QATModule to QuantizedModule through apply().

Parameters
  • module (Module) – root module to do convert recursively.

  • inplace (bool) – whether to convert submodules in-place.

  • mapping (Optional[dict]) – a dict indicating how to convert custom modules from QATModule to QuantizedModule. Will be combined with internal default convert mapping dict.

megengine.quantization.quantize.quantize_qat(module, inplace=True, qconfig=<megengine.quantization.qconfig.QConfig object>, mapping=None)[source]

Recursively convert float Module to QATModule through apply() and set qconfig relatively.

Parameters
  • module (Module) – root module to do convert recursively.

  • inplace (bool) – whether to convert submodules in-place.

  • qconfig (QConfig) – an instance of QConfig to be set as submodules’ qconfig. default is ema_fakequant_qconfig.

  • mapping (Optional[dict]) – a dict indicating how to convert custom modules from Module to QATModule. Will be combined with internal default convert mapping dict.

megengine.quantization.quantize.reset_qconfig(module, qconfig, inplace=True)[source]

Reset _FakeQuantize and Observer according to qconfig

Parameters
  • module (Module) – root module to reset recursively.

  • qconfig (QConfig) – an instance of QConfig to be set as submodules’ qconfig.

  • inplace (bool) – whether to reset submodules in-place.

megengine.quantization.utils

class megengine.quantization.utils.QuantMode(value)[source]

Bases: enum.Enum

Quantization mode enumerate class.

ASYMMERTIC = 2
SYMMERTIC = 1
class megengine.quantization.utils.Round[source]

Bases: megengine.core.autodiff.grad.Function

The functional round have no grad and can not use for quantization-aware-training. We use Function and STE(Straight-Through Estimator) to implement backward propagation.

backward(output_grads)[source]
forward(x)[source]
megengine.quantization.utils.apply()
megengine.quantization.utils.fake_quant_bias(bias, inp, w_qat)[source]

Apply fake quantization to bias, with the special scale from input tensor and weight tensor, the quantized type set to qint32 also.

Parameters
  • bias (Tensor) – the bias tensor which need to be faked.

  • inp (Tensor) – the input tensor which contain the quantization parameters.

  • qmax – the weight tensor which contain the quantization parameters.

Warning

Only work for symmetric quantization method now.

Return type

Tensor

megengine.quantization.utils.fake_quant_tensor(inp, qmin, qmax, q_dict)[source]

Apply fake quantization to the inp tensor.

Parameters
  • inp (Tensor) – the input tensor which need to be faked.

  • qmin (int) – the minimum value which the integer limit to.

  • qmax (int) – the maximum value which the integer limit to.

  • q_dict (Dict) – the quantization parameter dict.

Return type

Tensor

megengine.quantization.utils.get_qparam_dict(mode)[source]

Return the quantization parameters dictionary according to the mode.

megengine.quantization.utils.register_method_to_class(cls)[source]
megengine.quantization.utils.tqt_forward(qmin, qmax, inp, scale)[source]