nncf
#
Neural Network Compression Framework (NNCF) for enhanced OpenVINO™ inference.
Subpackages#
nncf.api
nncf.common
nncf.config
nncf.quantization
nncf.tensorflow
nncf.torch
Classes#
Contains the configuration parameters required for NNCF to apply the selected algorithms. |
|
Wrapper for passing custom user datasets into NNCF algorithms. |
|
Defines a mode for weight compression. |
|
Describes the accuracy drop type, which determines how the accuracy drop between |
|
Describes the model type the specificity of which will be taken into account during compression. |
|
Defines special modes. |
|
Target device architecture for compression. |
|
An enum with values corresponding to the available quantization presets. |
|
Provides an option to specify portions of model to be excluded from compression. |
Functions#
|
Returns the model object with as much custom NNCF additions as possible removed |
|
Compress model weights. |
|
Applies post-training quantization to the provided model. |
|
Applies post-training quantization algorithm with accuracy control to provided model. |
- nncf.strip(model, do_copy=True)[source]#
Returns the model object with as much custom NNCF additions as possible removed while still preserving the functioning of the model object as a compressed model.
- Parameters:
model (TModel) – The compressed model.
do_copy (bool) – If True (default), will return a copy of the currently associated model object. If False, will return the currently associated model object “stripped” in-place.
- Returns:
The stripped model.
- Return type:
TModel
- class nncf.NNCFConfig(*args, **kwargs)[source]#
Bases:
dict
Contains the configuration parameters required for NNCF to apply the selected algorithms.
This is a regular dictionary object extended with some utility functions, such as the ability to attach well-defined structures to pass non-serializable objects as parameters. It is primarily built from a .json file, or from a Python JSON-like dictionary - both data types will be checked against a JSONSchema. See the definition of the schema at https://openvinotoolkit.github.io/nncf/schema/, or by calling NNCFConfig.schema().
- classmethod from_dict(nncf_dict)[source]#
Load NNCF config from a Python dictionary. The dict must contain only JSON-supported primitives.
- Parameters:
nncf_dict (Dict) – A Python dict with the JSON-style configuration for NNCF.
- Return type:
- classmethod from_json(path)[source]#
Load NNCF config from a JSON file at path.
- Parameters:
path (str) – Path to the .json file containing the NNCF configuration.
- Return type:
- register_extra_structs(struct_list)[source]#
Attach the supplied list of extra configuration structures to this configuration object.
- Parameters:
struct_list (List[nncf.config.structures.NNCFExtraConfigStruct]) – List of extra configuration structures.
- get_redefinable_global_param_value_for_algo(param_name, algo_name)[source]#
Some parameters can be specified both on the global NNCF config .json level (so that they apply to all algos), and at the same time overridden in the algorithm-specific section of the .json. This function returns the value that should apply for a given algorithm name, considering the exact format of this config.
- Parameters:
param_name (str) – The name of a parameter in the .json specification of the NNCFConfig, that may be present either at the top-most level of the .json, or at the top level of the algorithm-specific subdict.
algo_name (str) – The name of the algorithm (among the allowed algorithm names in the .json) for which the resolution of the redefinable parameter should occur.
- Returns:
The value of the parameter that should be applied for the algo specified by algo_name.
- Return type:
Optional[str]
- class nncf.Dataset(data_source, transform_func=None)[source]#
Bases:
Generic
[DataItem
,ModelInput
]Wrapper for passing custom user datasets into NNCF algorithms.
This class defines the interface by which compression algorithms retrieve data items from the passed data source object. These data items are used for different purposes, for example, model inference and model validation, based on the choice of the exact compression algorithm.
If the data item has been returned from the data source per iteration and it cannot be used as input for model inference, the transformation function is used to extract the model’s input from this data item. For example, in supervised learning, the data item usually contains both examples and labels. So transformation function should extract the examples from the data item.
- Parameters:
data_source (Iterable[DataItem]) – The iterable object serving as the source of data items.
transform_func (Optional[Callable[[DataItem], ModelInput]]) – The function that is used to extract the model’s input from the data item. The data item here is the data item that is returned from the data source per iteration. This function should be passed when the data item cannot be directly used as model’s input. If this is not specified, then the data item will be passed into the model as-is.
- get_data(indices=None)[source]#
Returns the iterable object that contains selected data items from the data source as-is.
- Parameters:
indices (Optional[List[int]]) – The zero-based indices of data items that should be selected from the data source. The indices should be sorted in ascending order. If indices are not passed all data items are selected from the data source.
- Returns:
The iterable object that contains selected data items from the data source as-is.
- Return type:
Iterable[DataItem]
- get_inference_data(indices=None)[source]#
Returns the iterable object that contains selected data items from the data source, for which the transformation function was applied. The item, which was returned per iteration from this iterable, can be used as the model’s input for model inference.
- Parameters:
indices (Optional[List[int]]) – The zero-based indices of data items that should be selected from the data source. The indices should be sorted in ascending order. If indices are not passed all data items are selected from the data source.
- Returns:
The iterable object that contains selected data items from the data source, for which the transformation function was applied.
- Return type:
Iterable[ModelInput]
- class nncf.CompressWeightsMode[source]#
Bases:
enum.Enum
Defines a mode for weight compression. :param INT8_SYM: Stands for 8-bit integer symmetric quantization of all weights.
Weights are quantized symmetrically with a fixed zero point equals to 128. https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#symmetric-quantization
- Parameters:
INT8_ASYM – The same as INT8_SYM mode, but weights are quantized to a primary precision asymmetrically with a typical non-fixed zero point. https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#asymmetric-quantization
INT4_SYM – Stands for a mixed-precision weights quantization with 4-bit integer as a primary precision. Weights are quantized to a primary precision symmetrically with a fixed zero point equals to 8. All embeddings and the last layer are always compressed to a backup precision, which is INT8_ASYM, by default. All others are quantized whether to 4-bit integer or to a backup precision depending on criteria and the given ratio. https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#symmetric-quantization
INT4_ASYM – The same as INT4_SYM mode, but weights are quantized to a primary precision asymmetrically with a typical non-fixed zero point. https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#asymmetric-quantization
NF4 – The the same as INT4_SYM mode, but primary precision is NF4 data type without zero point.
INT8 – Mode is deprecated and will be removed in future releases. Please use INT8_ASYM instead.
- class nncf.DropType[source]#
Bases:
enum.Enum
Describes the accuracy drop type, which determines how the accuracy drop between the original model and the compressed model is calculated.
- Parameters:
ABSOLUTE – The accuracy drop is calculated as the absolute drop with respect to the results of the original model.
RELATIVE – The accuracy drop is calculated relative to the results of the original model.
- class nncf.ModelType[source]#
Bases:
enum.Enum
Describes the model type the specificity of which will be taken into account during compression.
- Parameters:
TRANSFORMER – Transformer-based models (https://arxiv.org/pdf/1706.03762.pdf)
- class nncf.QuantizationMode[source]#
Bases:
enum.Enum
Defines special modes. Currently contains only FP8-related modes (https://arxiv.org/pdf/2209.05433.pdf).
- Parameters:
FP8_E4M3 – Mode with 4-bit exponent and 3-bit mantissa.
FP8_E5M2 – Mode with 5-bit exponent and 2-bit mantissa.
- class nncf.TargetDevice[source]#
Bases:
enum.Enum
Target device architecture for compression.
Compression will take into account the value of this parameter in order to obtain the best performance for this type of device.
- class nncf.QuantizationPreset[source]#
Bases:
enum.Enum
An enum with values corresponding to the available quantization presets.
- nncf.compress_weights(model, mode=CompressWeightsMode.INT8_ASYM, ratio=None, group_size=None, ignored_scope=None, all_layers=None)[source]#
Compress model weights.
- Parameters:
model (nncf.api.compression.TModel) – A model to be compressed.
mode –
Defines a mode for weight compression. INT8_SYM stands for 8-bit integer symmetric quantization of all weights. INT8_ASYM is the same as INT8_SYM mode, but weights are quantized to a primary precision asymmetrically
with a typical non-fixed zero point.
- INT4_SYM stands for a mixed-precision weights quantization with 4-bit integer as a primary precision.
Weights are quantized to a primary precision symmetrically with a fixed zero point equals to 8. All embeddings and the last layer are always compressed to a backup precision, which is INT8_ASYM, by default. All others are quantized whether to 4-bit integer or to a backup precision depending on criteria and the given ratio.
- INT4_ASYM is the same as INT4_SYM mode, but weights are quantized to a primary precision asymmetrically
with a typical non-fixed zero point.
NF4 is the same as INT4_SYM mode, but primary precision is NF4 data type without zero point.
ratio (Optional[float]) – the ratio between baseline and backup precisions (e.g. 0.9 means 90% of layers quantized to NF4 and the rest to INT8_ASYM).
group_size (Optional[int]) – number of weights (e.g. 128) in the channel dimension that share quantization parameters (scale). The value -1 means no grouping.
ignored_scope (Optional[nncf.scopes.IgnoredScope]) – An ignored scope that defined the list of model control flow graph nodes to be ignored during quantization.
all_layers (Optional[bool]) – Indicates whether embeddings and last layers should be compressed to a primary precision. By default, the backup precision is assigned for the embeddings and last layers.
- Returns:
The non-trainable model with compressed weights.
- Return type:
nncf.api.compression.TModel
- nncf.quantize(model, calibration_dataset, mode=None, preset=None, target_device=TargetDevice.ANY, subset_size=300, fast_bias_correction=True, model_type=None, ignored_scope=None, advanced_parameters=None)[source]#
Applies post-training quantization to the provided model.
- Parameters:
model (TModel) – A model to be quantized.
calibration_dataset (nncf.Dataset) – A representative dataset for the calibration process.
mode (Optional[nncf.QuantizationMode]) – Special quantization mode that specify different ways of the optimization.
preset (nncf.QuantizationPreset) – A preset controls the quantization mode (symmetric and asymmetric). It can take the following values: - performance: Symmetric quantization of weights and activations. - mixed: Symmetric quantization of weights and asymmetric quantization of activations. Default value is None. In this case, mixed preset is used for transformer model type otherwise performance.
target_device (nncf.TargetDevice) – A target device the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device.
subset_size (int) – Size of a subset to calculate activations statistics used for quantization. Must be positive.
fast_bias_correction (bool) – Setting this option to False enables a different bias correction method which is more accurate, in general, and takes more time but requires less memory.
model_type (Optional[nncf.ModelType]) – Model type is needed to specify additional patterns in the model. Supported only transformer now.
ignored_scope (Optional[nncf.IgnoredScope]) – An ignored scope that defined the list of model control flow graph nodes to be ignored during quantization.
advanced_parameters (Optional[nncf.quantization.advanced_parameters.AdvancedQuantizationParameters]) – Advanced quantization parameters for fine-tuning the quantization algorithm.
- Returns:
The quantized model.
- Return type:
TModel
- nncf.quantize_with_accuracy_control(model, calibration_dataset, validation_dataset, validation_fn, max_drop=0.01, drop_type=DropType.ABSOLUTE, preset=None, target_device=TargetDevice.ANY, subset_size=300, fast_bias_correction=True, model_type=None, ignored_scope=None, advanced_quantization_parameters=None, advanced_accuracy_restorer_parameters=None)[source]#
Applies post-training quantization algorithm with accuracy control to provided model.
- Parameters:
model (TModel) – A model to be quantized.
calibration_dataset (nncf.Dataset) – A representative dataset for the calibration process.
validation_dataset (nncf.Dataset) – A dataset for the validation process.
validation_fn (Callable[[Any, Iterable[Any]], float]) –
A validation function to validate the model. It should take two arguments: - model: model to be validate. - validation_dataset: dataset that provides data items to
validate the provided model.
The function should return the value of the metric with the following meaning: A higher value corresponds to better performance of the model.
max_drop (float) – The maximum accuracy drop that should be achieved after the quantization.
drop_type (nncf.parameters.DropType) – The accuracy drop type, which determines how the maximum accuracy drop between the original model and the compressed model is calculated.
preset (nncf.QuantizationPreset) – A preset controls the quantization mode (symmetric and asymmetric). It can take the following values: - performance: Symmetric quantization of weights and activations. - mixed: Symmetric quantization of weights and asymmetric quantization of activations. Default value is None. In this case, mixed preset is used for transformer model type otherwise performance.
target_device (nncf.TargetDevice) – A target device the specificity of which will be taken into account while compressing in order to obtain the best performance for this type of device.
subset_size (int) – Size of a subset to calculate activations statistics used for quantization.
fast_bias_correction (bool) – Setting this option to False enables a different bias correction method which is more accurate, in general, and takes more time but requires less memory.
model_type (nncf.ModelType) – Model type is needed to specify additional patterns in the model. Supported only transformer now.
ignored_scope (nncf.IgnoredScope) – An ignored scope that defined the list of model control flow graph nodes to be ignored during quantization.
advanced_quantization_parameters (Optional[nncf.quantization.advanced_parameters.AdvancedQuantizationParameters]) – Advanced quantization parameters for fine-tuning the quantization algorithm.
advanced_accuracy_restorer_parameters (Optional[AdvancedAccuracyRestorerParameters]) – Advanced parameters for fine-tuning the accuracy restorer algorithm.
- Returns:
The quantized model.
- Return type:
TModel
- class nncf.IgnoredScope[source]#
Provides an option to specify portions of model to be excluded from compression.
The ignored scope defines model sub-graphs that should be excluded from the compression process such as quantization, pruning and etc.
Example:
import nncf # Exclude by node name: node_names = ['node_1', 'node_2', 'node_3'] ignored_scope = nncf.IgnoredScope(names=node_names) # Exclude using regular expressions: patterns = ['node_\d'] ignored_scope = nncf.IgnoredScope(patterns=patterns) # Exclude by operation type: # OpenVINO opset https://docs.openvino.ai/latest/openvino_docs_ops_opset.html operation_types = ['Multiply', 'GroupConvolution', 'Interpolate'] ignored_scope = nncf.IgnoredScope(types=operation_types) # ONNX opset https://github.com/onnx/onnx/blob/main/docs/Operators.md operation_types = ['Mul', 'Conv', 'Resize'] ignored_scope = nncf.IgnoredScope(types=operation_types)
Note: Operation types must be specified according to the model framework.
- Parameters:
names (List[str]) – List of ignored node names.
patterns (List[str]) – List of regular expressions that define patterns for names of ignored nodes.
types (bool) – List of ignored operation types.
validate – If set to True, then a RuntimeError will be raised if any ignored scope does not match in the model graph.