Defeating the Training-Inference Mismatch via FP16
- URL: http://arxiv.org/abs/2510.26788v1
- Date: Thu, 30 Oct 2025 17:58:11 GMT
- Title: Defeating the Training-Inference Mismatch via FP16
- Authors: Penghui Qi, Zichen Liu, Xiangxin Zhou, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin,
- Abstract summary: Reinforcement learning (RL) fine-tuning often suffers from instability due to the numerical mismatch between the training and inference policies.<n>We show that its root cause lies in the floating point precision itself.<n>The widely adopted BF16, despite its large dynamic range, introduces large rounding errors that breaks the consistency between training and inference.
- Score: 72.25890308541334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has attempted to mitigate this issue through algorithmic corrections or engineering alignments, we show that its root cause lies in the floating point precision itself. The widely adopted BF16, despite its large dynamic range, introduces large rounding errors that breaks the consistency between training and inference. In this work, we demonstrate that simply reverting to \textbf{FP16} effectively eliminates this mismatch. The change is simple, fully supported by modern frameworks with only a few lines of code change, and requires no modification to the model architecture or learning algorithm. Our results suggest that using FP16 uniformly yields more stable optimization, faster convergence, and stronger performance across diverse tasks, algorithms and frameworks. We hope these findings motivate a broader reconsideration of precision trade-offs in RL fine-tuning.
Related papers
- Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It [24.70923739848818]
We show that gradient noise and training-inference mismatch escalate in tandem as training progresses.<n>We find that the mismatch can be effectively suppressed by shrinking the update size.<n>We propose a simple yet effective solution: a specialized Learning Rate scheduler.
arXiv Detail & Related papers (2026-02-02T09:00:53Z) - INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats [51.72056104795248]
Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats.<n>This paper systematically investigates the trade-offs between FP and integer (INT) formats.<n>We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced.
arXiv Detail & Related papers (2025-10-29T15:11:53Z) - Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields [51.95157731126864]
Machine-learning force fields can deliver accurate molecular dynamics (MD) at high computational cost.<n>This thesis aims to make MACE cheaper and faster by identifying computational bottlenecks and evaluating low-precision execution policies.
arXiv Detail & Related papers (2025-10-23T14:02:34Z) - Asymmetric VAE for One-Step Video Super-Resolution Acceleration [63.419142632861345]
We propose FastVSR, which achieves substantial reductions in computational cost by implementing a high compression VAE.<n>FastVSR achieves speedups of 111.9 times compared to multi-step models and 3.92 times compared to existing one-step models.
arXiv Detail & Related papers (2025-09-29T00:36:14Z) - To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability [7.115739465137031]
BrainFloat16 (BF16) precision has become the de facto standard for large language model pretraining.<n>However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8 can be a cost-effective option for LLM training.<n>We propose new evaluation techniques and a new metric for quantifying loss landscape sharpness in autoregressive language models.
arXiv Detail & Related papers (2024-05-29T02:42:23Z) - Counterbalancing Teacher: Regularizing Batch Normalized Models for
Robustness [15.395021925719817]
Batch normalization (BN) is a technique for training deep neural networks that accelerates their convergence to reach higher accuracy.
We show that BN incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data.
We propose Counterbalancing Teacher (CT) to enforce the student network's learning of robust representations.
arXiv Detail & Related papers (2022-07-04T16:16:24Z) - Fast Adversarial Training with Adaptive Step Size [62.37203478589929]
We study the phenomenon from the perspective of training instances.
We propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS)
ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm.
arXiv Detail & Related papers (2022-06-06T08:20:07Z) - Revisiting BFloat16 Training [30.99618783594963]
State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision.
Deep learning accelerators are forced to support both 16-bit and 32-bit floating-point units.
arXiv Detail & Related papers (2020-10-13T05:38:07Z) - To be Robust or to be Fair: Towards Fairness in Adversarial Training [83.42241071662897]
We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data.
We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
arXiv Detail & Related papers (2020-10-13T02:21:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.