Google recently released Quantization Aware Training (QAT) API, which enables developers to train and deploy models with the performance benefits of quantization — i.e., the process of mapping input values from a large set to output values in a smaller set — while retaining close to their original accuracy.
The goal is to support the development of smaller, faster, and more efficient machine learning models well-suited to run on off-the-shelf machines, such as those in medium- and small-business environments where computation resources are at a premium.
Here’s why that’s important:
Often, the process of going from a higher to lower precision tends to be noisy. That’s because quantization squeezes a small range of floating-point values into a fixed number of information buckets, leading to information loss similar to rounding errors when fractional values are represented as integers. (For example, all values in range [2.0, 2.3] might be represented in a single bucket.) Problematically, when the lossy numbers are used in several computations, the losses accumulate and need to be rescaled for the next computation.