Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging
- URL: http://arxiv.org/abs/2510.17426v2
- Date: Thu, 30 Oct 2025 22:41:43 GMT
- Title: Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging
- Authors: Tiancheng Hu, Benjamin Minixhofer, Nigel Collier,
- Abstract summary: "alignment tax" of post-training is typically framed as a drop in task accuracy.<n>We show it also involves a severe loss of calibration, making models overconfident, less reliable, and model outputs less diverse.<n>We show that this trade-off can be navigated effectively via a simple post-hoc intervention: interpolating between a model's weights before and after alignment.
- Score: 35.958192369444056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The "alignment tax" of post-training is typically framed as a drop in task accuracy. We show it also involves a severe loss of calibration, making models overconfident, less reliable, and model outputs less diverse. We show that this trade-off can be navigated effectively via a simple post-hoc intervention: interpolating between a model's weights before and after alignment. Crucially, this is not a strict trade-off. We find that the process consistently reveals Pareto-optimal interpolations - models that improve accuracy beyond both parents while substantially recovering the calibration lost during alignment. Our work demonstrates that simple model merging provides a computationally efficient method for mitigating the full scope of the alignment tax, yielding models that are more capable and more reliable.
Related papers
- Inference-time Alignment via Sparse Junction Steering [25.464612964225484]
Token-level steering has emerged as a pivotal approach for inference-time alignment.<n>Existing methods rely on dense intervention at every decoding step.<n>We show that dense intervention is unnecessary and propose sparse junction steering.
arXiv Detail & Related papers (2026-01-30T08:40:47Z) - From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging [22.794831741556468]
Model merging combines expert models for multitask performance but faces challenges from parameter interference.<n>Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation.<n>We model this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation.
arXiv Detail & Related papers (2025-11-14T04:09:25Z) - Dense Cross-Scale Image Alignment With Fully Spatial Correlation and Just Noticeable Difference Guidance [44.06973005232111]
Existing unsupervised image alignment methods exhibit limited accuracy and high computational complexity.<n>We propose a dense cross-scale image alignment model that takes into account the correlations between cross-scale features to decrease alignment difficulty.<n>Our model supports flexible trade-offs between accuracy and efficiency by adjusting the number of scales utilized.
arXiv Detail & Related papers (2025-11-12T06:27:22Z) - Why Alignment Must Precede Distillation: A Minimal Working Explanation [50.784080714897776]
We show that the standard KD -> Align workflow diminishes the model's capacity to align rare yet desirable behaviors.<n>We demonstrate that alignment must first be performed on a high-recall reference before distillation.
arXiv Detail & Related papers (2025-09-28T06:12:19Z) - Stochastic Interpolants via Conditional Dependent Coupling [36.84747986070112]
Existing image generation models face critical challenges regarding the trade-off between computation and fidelity.<n>We introduce a unified multistage generative framework based on our proposed Conditional Dependent Coupling strategy.<n>It decomposes the generative process into interpolant trajectories at multiple stages, ensuring accurate distribution learning while enabling end-to-end optimization.
arXiv Detail & Related papers (2025-09-27T05:03:08Z) - Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach [34.478524949495345]
preference alignment is a key technology for the success of Large Language Models (LLMs)<n>In this paper, we investigate why preference alignment affects calibration and how to address this issue.
arXiv Detail & Related papers (2025-05-04T05:42:51Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation [10.209143402485406]
This paper argues that calibration is important in practice and is easy to maintain.<n>We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance.
arXiv Detail & Related papers (2021-12-14T06:19:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.