Describe the specifics of your model inputs here. This information is used to build the internal graph representation that is leveraged for proper compression functioning, and for exporting the compressed model to an executable format.
If this field is unspecified, NNCF will try to deduce the input shapes and tensor types for the graph building purposes based on dataloader objects that are passed to compression algorithms by the user.
Shape of the tensor expected as input to the model.
No Additional Items[
1,
3,
224,
224
]
Data type of the model input tensor.
Determines what the tensor will be filled with when passed to the model during tracing and exporting.
Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.
Shape of the tensor expected as input to the model.
No Additional Items[
1,
3,
224,
224
]
Data type of the model input tensor.
Determines what the tensor will be filled with when passed to the model during tracing and exporting.
Keyword to be used when passing the tensor to the model's 'forward' method - leave unspecified to pass the corresponding argument as a positional arg.
The target device, the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device. The default 'ANY' means compatible quantization supported by any HW. Set this value to 'TRIAL' if you are going to use a custom quantization schema.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
"quantization"
Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.
No Additional PropertiesType of precision initialization.
Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md
Specific value:"hawq"
Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md
Specific value:"autoq"
Allows to manually specify via following config options the exact bitwidth for each quantizer location.
Specific value:"manual"
A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight
and activation
sections.
[
4,
8
]
Number of data points to iteratively estimate Hessian trace.
Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.
Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.
For the hawq
mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq
mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio
equal to 0.25,and a uniform INT4-quantized model would have compression_ratio
equal to 0.125.
The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.
The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.
Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.
No Additional ItemsA tuple of a bitwidth and a scope of the quantizer to assign the bitwidth to.
No Additional Items[
[
2,
"ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
],
[
8,
"ResNet/ReLU[relu]/relu__0|OUTPUT"
]
]
Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.
Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.
The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.
The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.
Whether the model inputs should be immediately quantized prior to any other model operations.
Whether the model outputs should be additionally quantized.
Constraints to be applied to model weights quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Constraints to be applied to model activations quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.
No Additional ItemsThis option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.
No Additional Properties{
"weights": {
"QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
"mode": "asymmetric"
},
"activations": {
"{re}.*conv_first.*": {
"mode": "asymmetric"
},
"{re}.*conv_second.*": {
"mode": "symmetric"
}
}
}
}
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).
This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable
, the fix will not be applied. If set to enable
or first_layer_only
, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.
Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.
No Additional PropertiesGradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.
A zero-based index of the epoch, upon reaching which the activations will start to be quantized.
Epoch index upon which the weights will start to be quantized.
Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.
Duration, in epochs, of the learning rate dropping process.
Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.
Initial value of learning rate.
Initial value of weight decay.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Applies filter pruning during training of the model to effectively remove entire sub-dimensions of tensors in the original model from computation and therefore increase performance.
See Pruning.md and the rest of this schema for more details and parameters.
"filter_pruning"
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
Initial value of the pruning level applied to the prunable operations.
The type of filter importance metric.
Target value of the pruning level for the operations that can be pruned. The operations are determined by analysis of the model architecture during the pruning algorithm initialization stage.
Number of epochs during which the pruning level is increased from pruning_init
to pruning_target
.
Target value of the pruning level for model FLOPs.
The type of scheduling to use for adjusting the target pruning level.
Number of epochs for model pretraining before starting filter pruning.
The type of filter ranking across the layers.
Whether to prune layers independently (choose filters with the smallest importance in each layer separately) or not.
Whether to prune first convolutional layers or not. A 'first' convolutional layer is such a layer that the path from model input to this layer has no other convolution operations on it.
Whether to prune downsampling convolutional layers (with stride > 1) or not.
Whether to prune parameters of the Batch Norm layer that corresponds to pruned filters of the convolutional layer which feeds its output to this Batch Norm.
Describes parameters specific to the LeGR pruning algorithm.See Pruning.md for more details.
No Additional PropertiesNumber of generations for the evolution algorithm.
Number of training steps to estimate pruned model accuracy.
Pruning level for the model to train LeGR algorithm on it. If learned ranking will be used for multiple pruning levels, the highest should be used as max_pruning
. If model will be pruned with one pruning level, this target should be used.
Random seed for LeGR coefficients generation.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its magnitude. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.
No Additional Properties"magnitude_sparsity"
Initial value of the sparsity level applied to the model.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
The mode of sparsity level setting.
global
- the sparsity level is calculated across all weight values in the network across layers, local
- the sparsity level can be set per-layer and within each layer is computed with respect only to the weight values within that layer.
The type of scheduling to use for adjusting the targetsparsity level. Default - exponential for rb_sparsity
, polynomial otherwise
Target sparsity level for the model, to be reached at the end of the compression schedule.
Index of the epoch upon which the sparsity level of the model is scheduled to become equal to sparsity_target
.
Index of the epoch upon which the sparsity mask will be frozen and no longer trained.
Whether the function-based sparsity level schedulers should update the sparsity level after each optimizer step instead of each epoch step.
Number of optimizer steps in one epoch. Required to start proper scheduling in the first training epoch if update_per_optimizer_step
is true.
A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).
No Additional ItemsMultistep scheduler only - Levels of sparsity to use at each step of the scheduler as specified in the multistep_steps
attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the multistep_steps
by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.
A conventional patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.
For polynomial scheduler - determines the corresponding power value.
For polynomial scheduler - if true
, then the target sparsity level will be approached in concave manner, and in convex manner otherwise.
Determines the way in which the weight values will be sorted after being aggregated in order to determine the sparsity threshold corresponding to a specific sparsity level.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its importance as determined by the regularization-based sparsity algorithm. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.
No Additional Properties"rb_sparsity"
Initial value of the sparsity level applied to the model
The mode of sparsity level setting.
global
- the sparsity level is calculated across all weight values in the network across layers, local
- the sparsity level can be set per-layer and within each layer is computed with respect only to the weight values within that layer.
The type of scheduling to use for adjusting the targetsparsity level. Default - exponential for rb_sparsity
, polynomial otherwise
Target sparsity level for the model, to be reached at the end of the compression schedule.
Index of the epoch upon which the sparsity level of the model is scheduled to become equal to sparsity_target
.
Index of the epoch upon which the sparsity mask will be frozen and no longer trained.
Whether the function-based sparsity level schedulers should update the sparsity level after each optimizer step instead of each epoch step.
Number of optimizer steps in one epoch. Required to start proper scheduling in the first training epoch if update_per_optimizer_step
is true.
A list of scheduler steps at which to transition to the next scheduled sparsity level (multistep scheduler only).
No Additional ItemsMultistep scheduler only - Levels of sparsity to use at each step of the scheduler as specified in the multistep_steps
attribute. The first sparsity level will be applied immediately, so the length of this list should be larger than the length of the multistep_steps
by one. The last sparsity level will function as the ultimate sparsity target, overriding the "sparsity_target" setting if it is present.
A conventional patience parameter for the scheduler, as for any other standard scheduler. Specified in units of scheduler steps.
For polynomial scheduler - determines the corresponding power value.
For polynomial scheduler - if true
, then the target sparsity level will be approached in concave manner, and in convex manner otherwise.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.
No Additional Properties"knowledge_distillation"
Type of Knowledge Distillation Loss.
Knowledge Distillation loss value multiplier
softmax
type only - Temperature for logits softening.
This algorithm takes no additional parameters and is used when you want to load a checkpoint trained with another sparsity algorithm and do other compression without changing the sparsity mask.
No Additional Properties"const_sparsity"
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Binarization is a specific particular case of the more general quantization algorithm.
See Binarization.md and the rest of this schema for more details and parameters.
"binarization"
Selects the mode of binarization - either 'xnor' for XNOR binarization,or 'dorefa' for DoReFa binarization.
Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.
No Additional PropertiesType of precision initialization.
Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md
Specific value:"hawq"
Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md
Specific value:"autoq"
Allows to manually specify via following config options the exact bitwidth for each quantizer location.
Specific value:"manual"
A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight
and activation
sections.
[
4,
8
]
Number of data points to iteratively estimate Hessian trace.
Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.
Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.
For the hawq
mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq
mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio
equal to 0.25,and a uniform INT4-quantized model would have compression_ratio
equal to 0.125.
The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.
The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.
Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.
No Additional ItemsA tuple of a bitwidth and a scope of the quantizer to assign the bitwidth to.
No Additional Items[
[
2,
"ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
],
[
8,
"ResNet/ReLU[relu]/relu__0|OUTPUT"
]
]
Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.
Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.
The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.
Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.
No Additional PropertiesGradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.
A zero-based index of the epoch, upon reaching which the activations will start to be quantized.
Epoch index upon which the weights will start to be quantized.
Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.
Duration, in epochs, of the learning rate dropping process.
Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.
Initial value of learning rate.
Initial value of weight decay.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
"experimental_quantization"
Specifies the kind of pre-training initialization used for the quantization algorithm.
Some kind of initialization is usually required so that the trainable quantization parameters have a better chance to get fine-tuned to values that result in good accuracy.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
This initializer performs advanced selection of bitwidth per each quantizer location, trying to achieve the best tradeoff between performance and quality of the resulting model.
No Additional PropertiesType of precision initialization.
Applies HAWQ algorithm to determine best bitwidths for each quantizer using a Hessiancalculation approach. For more details see Quantization.md
Specific value:"hawq"
Applies AutoQ algorithm to determine best bitwidths for each quantizer using reinforcement learning. For more details see Quantization.md
Specific value:"autoq"
Allows to manually specify via following config options the exact bitwidth for each quantizer location.
Specific value:"manual"
A list of bitwidth to choose from when performing precision initialization. Overrides bits constraints specified in weight
and activation
sections.
[
4,
8
]
Number of data points to iteratively estimate Hessian trace.
Maximum number of iterations of Hutchinson algorithm to Estimate Hessian trace.
Minimum relative tolerance for stopping the Hutchinson algorithm. It's calculated between mean average trace from the previous iteration and the current one.
For the hawq
mode:
The desired ratio between bit complexity of a fully INT8 model and a mixed-precision lower-bit one. On precision initialization stage the HAWQ algorithm chooses the most accurate mixed-precision configuration with a ratio no less than the specified. Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of FLOPS for the layer by the number of bits for its quantization.
For the autoq
mode:
The target model size after quantization, relative to total parameters size in FP32. E.g. a uniform INT8-quantized model would have a compression_ratio
equal to 0.25,and a uniform INT4-quantized model would have compression_ratio
equal to 0.125.
The desired ratio of dataloader to be iterated during each search iteration of AutoQ precision initialization. Specifically, this ratio applies to the registered autoqevalloader via registerdefaultinit_args.
The number of random policy at the beginning of of AutoQ precision initialization to populate replay buffer with experiences. This key is meant for internal testing use. Users need not to configure.
Manual settings for the quantizer bitwidths. Scopes are used to identify the quantizers.
No Additional ItemsA tuple of a bitwidth and a scope of the quantizer to assign the bitwidth to.
No Additional Items[
[
2,
"ResNet/NNCFConv2d[conv1]/conv2d_0|WEIGHT"
],
[
8,
"ResNet/ReLU[relu]/relu__0|OUTPUT"
]
]
Path to serialized PyTorch Tensor with average Hessian traces per quantized modules. It can be used to accelerate mixed precision initialization by using average Hessian traces from previous run of HAWQ algorithm.
Whether to dump data related to Precision Initialization algorithm. HAWQ dump includes bitwidth graph, average traces and different plots. AutoQ dump includes DDPG agent learning trajectory in tensorboard and mixed-precision environment metadata.
The mode for assignment bitwidth to activation quantizers. In the 'strict' mode,a group of quantizers that feed their output to one and more same modules as input (weight quantizers count as well) will have the same bitwidth in the 'liberal' mode allows different precisions within the group.
Bitwidth is assigned based on hardware constraints. If multiple variants are possible, the minimal compatible bitwidth is chosen.
The preset defines the quantization schema for weights and activations. The 'performance' mode sets up symmetric weight and activation quantizers. The 'mixed' mode utilizes symmetric weight quantization and asymmetric activation quantization.
Whether the model inputs should be immediately quantized prior to any other model operations.
Whether the model outputs should be additionally quantized.
Constraints to be applied to model weights quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Constraints to be applied to model activations quantization only.
No Additional PropertiesMode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
Whether to use log of scale as the optimization parameter instead of the scale itself. This serves as an optional regularization opportunity for training quantizer scales.
Specifies operations in the model which will share the same quantizer module for activations. This is helpful in case one and the same quantizer scale is required for each input of this operation. Each sub-array will define a group of model operation inputs that have to share a single actual quantization module, each entry in this subarray should correspond to exactly one node in the NNCF graph and the groups should not overlap. The final quantizer for each sub-array will be associated with the first element of this sub-array.
No Additional ItemsThis option is used to specify overriding quantization constraints for specific scope,e.g. in case you need to quantize a single operation differently than the rest of the model. Any other automatic or group-wise settings will be overridden.
No Additional Properties{
"weights": {
"QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": {
"mode": "asymmetric"
},
"activations": {
"{re}.*conv_first.*": {
"mode": "asymmetric"
},
"{re}.*conv_second.*": {
"mode": "symmetric"
}
}
}
}
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
All properties whose name matches the following regular expression must respect the following conditions
Property name regular expression:.*
Mode of quantization. See Quantization.md for more details.
Bitwidth to quantize to. It is intended for manual bitwidth setting. Can be overridden by the bits
parameter from the precision
initializer section. An error occurs if it doesn't match a corresponding bitwidth constraints from the hardware configuration.
Whether to use signed or unsigned input/output values for quantization. true
will force the quantization to support signed values, false
will force the quantization to only support input values with one and the same sign, and leaving this value unspecified (default) means relying on the initialization statistics to determine best approach.
Note: If set to false
, but the input values have differing signs during initialization, signed quantization will be performed instead.
Whether to quantize inputs of this quantizer per each channel of input tensor (per 0-th dimension for weight quantization, and per 1-st dimension for activation quantization).
This initializer performs forward runs of the model to be quantized using samples from a user-supplied data loader to gather activation and weight tensor statistics within the network and use these to set up initial range parameters for quantizers.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
Number of samples from the training dataset to consume as sample model inputs for purposes of setting initial minimum and maximum quantization ranges.
Type of the initializer - determines which statistics gathered during initialization will be used to initialize the quantization ranges.
'Online' initializers do not have to store intermediate statistics in memory, while 'offline' do. Increasing the number of initialization samples for 'offline' initialization types will increase RAM overhead of applying NNCF to the model.
Depending on whether the quantizer is configured to be per-tensor or per-channel, the statistics will be collected either on the basis of the set of the entire tensor values, or these will be collected and applied separately for each channel value subset.
Minimum quantizer range initialized using minima of per-channel minima of the tensor to be quantized, maximum quantizer range initialized using maxima of per-channel maxima of the tensor to be quantized. Offline.
Specific value:"mixed_min_max"
Minimum quantizer range initialized using global minimum of values in the tensor to be quantized, maximum quantizer range initialized using global maxima of the samevalues. Online.
Specific value:"min_max"
Minimum quantizer range initialized using averages (across every single initialization sample) of minima of values in the tensor to be quantized, maximum quantizer range initialized using maxima respectively. Offline.
Specific value:"mean_min_max"
Quantizer minimum and maximum ranges set to be equal to +- 3 median absolute deviation from the median of the observed values in the tensor to be quantized. Offline.
Specific value:"threesigma"
Quantizer minimum and maximum ranges set to be equal to specified percentiles of the the observed values (across the entire initialization sample set) in the tensor to be quantized. Offline.
Specific value:"percentile"
Quantizer minimum and maximum ranges set to be equal to averaged (across every single initialization sample) specified percentiles of the the observed values in the tensor to be quantized. Offline.
Specific value:"mean_percentile"
Type-specific parameters of the initializer.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input minimum.
For 'percentile' and 'mean_percentile' types - specify the percentile of input value histograms to be set as the initial value for the quantizer input maximum.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
The target group of quantizers for which the specified type of range initialization will be applied. If unspecified, then the range initialization of the given type will be applied to all quantizers.
[Deprecated] Determines how should the additional quantization operations be exported into the ONNX format. Set this to true to export to ONNX standard QuantizeLinear-DequantizeLinear node pairs (8-bit quantization only) or to false to export to OpenVINO-supported FakeQuantize ONNX(all quantization settings supported).
This option controls whether to apply the overflow issue fix for the appropriate NNCF config or not. If set to disable
, the fix will not be applied. If set to enable
or first_layer_only
, while appropriate target_devices are chosen, the fix will be applied to all layers or to the first convolutional layer respectively.
Configures the staged quantization compression scheduler for the quantization algorithm. The quantizers will not be applied until a given epoch count is reached.
No Additional PropertiesGradients will be accumulated for this number of batches before doing a 'backward' call. Increasing this may improve training quality, since binarized networks exhibit noisy gradients and their training requires larger batch sizes than could be accommodated by GPUs.
A zero-based index of the epoch, upon reaching which the activations will start to be quantized.
Epoch index upon which the weights will start to be quantized.
Epoch index upon which the learning rate will start to be dropped. If unspecified, learning rate will not be dropped.
Duration, in epochs, of the learning rate dropping process.
Epoch to disable weight decay in the optimizer. If unspecified, weight decay will not be disabled.
Initial value of learning rate.
Initial value of weight decay.
A list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Defines training strategy for tuning supernet. By default, progressive shrinking
Defines the order of adding a new elasticity dimension from stage to stage
No Additional Items[
"width",
"depth",
"kernel"
]
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
List of parameters per each supernet training stage
No Additional ItemsDefines a supernet training stage: how many epochs it takes, which elasticities with which settings are enabled, whether some operation should happen in the beginning
No Additional PropertiesElasticity dimensions that are enabled for subnet sampling,the rest elastic dimensions are disabled
No Additional ItemsDuration of the training stage in epochs
Restricts the maximum number of blocks in each independent group that can be skipped. For example, Resnet50 has 4 four independent groups, each group consists of a specific number of Bottleneck layers [3,4,6,3], that potentially can be skipped. If depth indicator equals to 1, only the last Bottleneck can be skipped in each group, if it equals 2 - the last two and etc. This allows to implement progressive shrinking logic from Once for all
paper. Default value is 1.
Restricts the maximum number of width values in each elastic layer. For example, some conv2d with elastic width can vary number of output channels from the following list: [8, 16, 32] If width indicator is equal to 1, it can only activate the maximum number of channels - 32. If it equals 2, then the last two can be selected - 16 or 32, or both of them.
if True, triggers reorganization of weights in order to have filters sorted by importance (e.g. by l2 norm) in the beginning of the stage
if True, triggers batchnorm adaptation in the beginning of the stage
Initial learning rate for a stage. If specified in the stage descriptor, it will trigger a reset of the learning rate at the beginning of the stage.
Number of epochs to compute the adjustment of the learning rate.
Number of iterations to activate the random subnet. Default value is 1.
List of building blocks to be skipped. The block is defined by names of start and end nodes. The end node is skipped. In contrast, the start node is executed. It produces a tensor that is bypassed through the skipping nodes until the one after end node.
No Additional Items[
[
"start_op_1",
"end_op_1"
],
[
"start_op_2",
"end_op_2"
]
]
Defines minimal number of operations in the skipping block. Option is available for the auto mode only. Default value is 5
Defines maximal number of operations in the block. Option is available for the auto mode only. Default value is 50
If True, automatic block search will not relate operations, which are fused on inference, into different blocks for skipping. True, by default
Minimal number of output channels that can be activated for each layers with elastic width. Default value is 32.
Restricts total number of different elastic width values for each layer. The default value is -1 means that there's no restrictions.
Defines a step size for a generation of the elastic width search space - the list of all possible width values for each layer. The generation starts from the number of output channels in the original model and stops when it reaches whether a min_width
width value or number of generated width values equal to max_num_widths
Defines elastic width search space via a list of multipliers. All possible width values are obtained by multiplying the original width value with the values in the given list.
No Additional ItemsThe type of filter importance metric. Can be one of L1
, L2
, geometric_median
, external
. L2
by default.
Path to the custom external weight importance (PyTorch tensor) per node that needs to weight reorder. Valid only when filterimportance is external
. The file should be loaded via the torch interface torch.load(), represented as a dictionary. It maps NNCF node name to importance tensor with the same shape as the weights in the node module. For example, node Model/NNCFLinear[fc1]/linear_0
has a 3x1 linear module with weight [0.2, 0.3, 0.9], and in the dict{'Model/NNCFLinear[fc1]/linear0': tensor([0.4, 0.01, 0.2])} represents the corresponding weight importance.
Restricts the total number of different elastic kernel values for each layer. The default value is -1 means that there's no restrictions.
Defines the available elasticity dimension for sampling subnets. By default, all elastic dimensions are available - [width, depth, kernel]
No Additional ItemsA list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional ItemsA list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional ItemsDefines a global learning rate scheduler.If these parameters are not set, a stage learning rate scheduler will be used.
Defines the number of samples used for each training epoch.
Defines the search algorithm. Default algorithm is NSGA-II.
This initializer is applied by default to utilize batch norm statistics adaptation to the current compression scenario. See documentation for more details.
No Additional PropertiesNumber of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. The actual number of samples will be a closest multiple of the batch size. Set this to 0 to disable BN adaptation.
Defines the number of evaluations that will be used by the search algorithm.
Number of constraints in search problem.
Defines the population size when using an evolutionary search algorithm.
Crossover probability used by a genetic algorithm.
Crossover eta.
Mutation eta for genetic algorithm.
Mutation probability for genetic algorithm.
Defines the absolute difference in accuracy that is tolerated when looking for a subnetwork.
Defines the reference accuracy from the pre-trained model used to generate the super-network.
Information to indicate the preferred parts of the Pareto front
No Additional Itemsepsilon distance of surviving solutions for RNSGA-II.
weights used by RNSGA-II.
Find extreme points and use them as aspiration points.
This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.
Same definition as compression_oneOf_i0_oneOf_i4This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.
Same definition as compression_oneOf_i0_oneOf_i4"movement_sparsity"
Index of the starting epoch (include) for warmup stage.
Index of the end epoch (exclude) for warmup stage.
The regularization factor on weight importance scores. With a larger positive value, more model weights will be regarded as less important and thus be sparsified.
Whether to do structured mask resolution after warmup stage. Only supports structured masking on multi-head self-attention blocks and feed-forward networks now.
The power value of polynomial decay for threshold and regularization factor update during warmup stage.
The initial value of importance threshold during warmup stage. If not specified, this will be automatically decided during training so that the model is with about 0.1% linear layer sparsity on involved layers at the beginning of warmup stage.
The final value of importance threshold during warmup stage.
Number of training steps in one epoch, used for proper threshold and regularization factor updates. Optional if warmupstartepoch >=1 since this can be counted in the 1st epoch. Otherwise users have to specify it.
Describes how each supported layer will be sparsified.
No Additional ItemsDefines in which mode a supported layer will be sparsified.
The block shape for weights to sparsify. Required when mode
="block".
The dimension for weights to sparsify. Required when mode
="per_dim".
Model control flow graph node scopes to be considered in this mode.
No Additional ItemsA list of model control flow graph node scopes to be ignored for this operation - functions as an 'allowlist'. Optional.
No Additional Items"{re}conv.*"
[
"LeNet/relu_0",
"LeNet/relu_1"
]
A list of model control flow graph node scopes to be considered for this operation - functions as a 'denylist'. Optional.
No Additional Items[
"UNet/ModuleList[down_path]/UNetConvBlock[1]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[2]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[3]/Sequential[block]/Conv2d[0]",
"UNet/ModuleList[down_path]/UNetConvBlock[4]/Sequential[block]/Conv2d[0]"
]
"UNet/ModuleList\\[up_path\\].*"
If set to True, then a RuntimeError will be raised if the names of the ignored/target scopes do not match the names of the scopes in the model graph.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
Applies filter pruning during training of the model to effectively remove entire sub-dimensions of tensors in the original model from computation and therefore increase performance.
See Pruning.md and the rest of this schema for more details and parameters.
Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its magnitude. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.
Same definition as compression_oneOf_i0_oneOf_i2Applies sparsity on top of the current model. Each weight tensor value will be either kept as-is, or set to 0 based on its importance as determined by the regularization-based sparsity algorithm. For large sparsity levels, this will improve performance on hardware that can profit from it. See Sparsity.md and the rest of this schema for more details and parameters.
Same definition as compression_oneOf_i0_oneOf_i3This algorithm is only useful in combination with other compression algorithms and improves theend accuracy result of the corresponding algorithm by calculating knowledge distillation loss between the compressed model currently in training and its original, uncompressed counterpart. See KnowledgeDistillation.md and the rest of this schema for more details and parameters.
Same definition as compression_oneOf_i0_oneOf_i4This algorithm takes no additional parameters and is used when you want to load a checkpoint trained with another sparsity algorithm and do other compression without changing the sparsity mask.
Same definition as compression_oneOf_i0_oneOf_i5Binarization is a specific particular case of the more general quantization algorithm.
See Binarization.md and the rest of this schema for more details and parameters.
Applies quantization on top of the input model, simulating future low-precision execution specifics, and selects the quantization layout and parameters to strive for the best possible quantized model accuracy and performance.
See Quantization.md and the rest of this schema for more details and parameters.
Options for the execution of the NNCF-powered 'Accuracy Aware' training pipeline. The 'mode' property determines the mode of the accuracy-aware training execution and further available parameters.
Early exit mode schema. See EarlyExitTraining.md for more general info on this mode.
No Additional Properties"early_exit"
Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.
Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.
The maximal total fine-tuning epoch count. If the accuracy criteria wouldn't reach during fine-tuning, the most accurate model will be returned.
Adaptive compression level training mode schema. See AdaptiveCompressionLevelTraining.md for more general info on this mode.
No Additional Properties"adaptive_compression_level"
Maximally allowed accuracy degradation of the model in percent relative to the original model accuracy.
Maximally allowed accuracy degradation of the model in units of absolute metric of the original model.
Number of epochs to fine-tune during the initial training phase of the adaptive compression training loop.
Initial value for the compression rate increase/decrease training phase of the compression training loop.
Factor used to reduce the compression rate change step in the adaptive compression training loop.
Factor used to reduce the learning rate after compression rate step is reduced
The minimal compression rate change step value after which the training loop is terminated.
The number of epochs to fine-tune the model for a given compression rate after the initial training phase of the training loop.
The maximal total fine-tuning epoch count. If the epoch counter reaches this number, the fine-tuning process will stop and the model with the largest compression rate will be returned.
PyTorch only - Used to increase/decrease gradients for compression algorithms' parameters. The gradients will be multiplied by the specified value. If unspecified, the gradients will not be adjusted.
[Deprecated] Whether to enable strict input tensor shape matching when building the internal graph representation of the model. Set this to false if your model inputs have any variable dimension other than the 0-th (batch) dimension, or if any non-batch dimension of the intermediate tensors in your model execution flow depends on the input dimension, otherwise the compression will most likely fail.
Log directory for NNCF-specific logging outputs.