Related papers: FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

URL: http://arxiv.org/abs/2602.23192v1
Date: Thu, 26 Feb 2026 16:44:47 GMT
Title: FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification
Authors: Thomas Woergaard, Raghavendra Selvan,
Abstract summary: We study fairness-aware mixed-precision quantization schemes for medical image classification under explicit bit budgets.<n>We introduce FairQuant, a framework that combines group-aware importance analysis, budgeted mixed-precision allocation, and a learnable Bit-Aware Quantization (BAQ) mode.
Score: 6.445605125467573
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compressing neural networks by quantizing model parameters offers useful trade-off between performance and efficiency. Methods like quantization-aware training and post-training quantization strive to maintain the downstream performance of compressed models compared to the full precision models. However, these techniques do not explicitly consider the impact on algorithmic fairness. In this work, we study fairness-aware mixed-precision quantization schemes for medical image classification under explicit bit budgets. We introduce FairQuant, a framework that combines group-aware importance analysis, budgeted mixed-precision allocation, and a learnable Bit-Aware Quantization (BAQ) mode that jointly optimizes weights and per-unit bit allocations under bitrate and fairness regularization. We evaluate the method on Fitzpatrick17k and ISIC2019 across ResNet18/50, DeiT-Tiny, and TinyViT. Results show that FairQuant configurations with average precision near 4-6 bits recover much of the Uniform 8-bit accuracy while improving worst-group performance relative to Uniform 4- and 8-bit baselines, with comparable fairness metrics under shared budgets.

Related papers

MoR: Mixture Of Representations For Mixed-Precision Training [0.398636957150696]
Mixture-of-Representations (MoR) is a novel, per-tensor and sub-tensor level quantization framework.<n>MoR dynamically analyzes a tensor's numerical properties to select between a variety of different representations.<n>Our initial findings show that this approach can achieve state-of-the-art results with 98.38% of tensors quantized to the FP8 format.
arXiv Detail & Related papers (2025-12-28T06:28:50Z)
Mixed-Precision Quantization for Language Models: Techniques and Prospects [10.345914140081925]
Quantization has emerged as an essential compression technique to reduce model size, alleviate memory bottlenecks, and accelerate inference.<n>Mixed-precision quantization offers a promising alternative by selectively allocating precision across layers or within tensors to balance efficiency and accuracy.
arXiv Detail & Related papers (2025-10-19T12:16:40Z)
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.402892455344677]
We propose an efficient quantization framework for Stable Diffusion models (SDM)<n>Our framework simultaneously maintains training-inference consistency and ensures optimization stability.<n>Our method demonstrates superior performance over state-of-the-art approaches with shorter training times.
arXiv Detail & Related papers (2024-12-09T17:00:20Z)
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization [67.3213104337679]
Quantization is a powerful tool for accelerating large language model (LLM) inference, but the accuracy-performance trade-offs across different formats remain unclear.<n>We conduct the most comprehensive empirical study to date, evaluating FP8, INT8, and INT4 quantization across academic benchmarks and real-world tasks.
arXiv Detail & Related papers (2024-11-04T18:21:59Z)
Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models [2.926259075657424]
Diffusion models generate images by iteratively denoising random Gaussian noise using deep neural networks. Recent works propose low-bitwidth (e.g., 8-bit or 4-bit) quantization for diffusion models, however 4-bit integer quantization typically results in low-quality images. We propose an effective floating-point quantization method for diffusion models that provides better image quality compared to integer quantization methods.
arXiv Detail & Related papers (2024-08-13T15:56:20Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [63.118592279833656]
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs)<n>We propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise.<n> Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Towards a tailored mixed-precision sub-8-bit quantization scheme for Gated Recurrent Units using Genetic Algorithms [39.979007027634196]
Gated Recurrent Units (GRU) are difficult to tune due to their dependence on an internal state. We propose a modular integer quantization scheme for GRUs where the bit width of each operator can be selected independently.
arXiv Detail & Related papers (2024-02-19T16:24:20Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
Post-Training Quantization for Vision Transformer [85.57953732941101]
We present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. We can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
arXiv Detail & Related papers (2021-06-27T06:27:22Z)
Differentiable Model Compression via Pseudo Quantization Noise [99.89011673907814]
We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. We experimentally verify that our method outperforms state-of-the-art quantization techniques on several benchmarks and architectures for image classification, language modeling, and audio source separation.
arXiv Detail & Related papers (2021-04-20T14:14:03Z)
Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
FracBits: Mixed Precision Quantization via Fractional Bit-Widths [29.72454879490227]
Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints.
arXiv Detail & Related papers (2020-07-04T06:09:09Z)
Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.