Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
- URL: http://arxiv.org/abs/2201.08442v1
- Date: Thu, 20 Jan 2022 20:35:37 GMT
- Title: Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
- Authors: Sangeetha Siddegowda, Marios Fournarakis, Markus Nagel, Tijmen
Blankevoort, Chirag Patel, Abhijit Khobare
- Abstract summary: We present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET)
AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization.
We provide a practical guide to quantization via AIMET by covering PTQ and QAT, code examples and practical tips.
- Score: 15.439669159557253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While neural networks have advanced the frontiers in many machine learning
applications, they often come at a high computational cost. Reducing the power
and latency of neural network inference is vital to integrating modern networks
into edge devices with strict power and compute requirements. Neural network
quantization is one of the most effective ways of achieving these savings, but
the additional noise it induces can lead to accuracy degradation. In this white
paper, we present an overview of neural network quantization using AI Model
Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization
and compression algorithms designed to ease the effort required for model
optimization and thus drive the broader AI ecosystem towards low latency and
energy-efficient inference. AIMET provides users with the ability to simulate
as well as optimize PyTorch and TensorFlow models. Specifically for
quantization, AIMET includes various post-training quantization (PTQ, cf.
chapter 4) and quantization-aware training (QAT, cf. chapter 5) techniques that
guarantee near floating-point accuracy for 8-bit fixed-point inference. We
provide a practical guide to quantization via AIMET by covering PTQ and QAT
workflows, code examples and practical tips that enable users to efficiently
and effectively quantize models using AIMET and reap the benefits of low-bit
integer inference.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.