Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
- URL: http://arxiv.org/abs/2511.19486v1
- Date: Sun, 23 Nov 2025 05:23:21 GMT
- Title: Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
- Authors: Lei Wang, Zikun Ye, Jinglong Zhao,
- Abstract summary: We develop a framework that combines fine-tuning and rectification, and optimally allocates limited labeled samples across the two stages.<n>Building on this insight, we leverage empirical scaling laws to develop a data-driven method for optimally splitting samples between the fine-tuning and rectification stages.<n> Empirical analysis validates our framework, demonstrating improved estimation and inference performance compared to using either fine-tuning or rectification alone.
- Score: 2.503562746177713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driven by recent advances in artificial intelligence (AI), a growing body of work demonstrates the potential of using large language models (LLMs) to generate human-like responses in market research and social science applications. Two primary approaches can be applied to improve the performance of LLMs: fine-tuning, which aligns LLM predictions more closely with human responses, and rectification, which corrects biases in LLM outputs. In this paper, we develop a framework that combines fine-tuning and rectification, and optimally allocates limited labeled samples across the two stages. Unlike the conventional objective that minimizes the mean squared prediction errors, we propose to minimize the variance of the prediction errors as the fine-tuning objective, which is optimal for the downstream rectification stage. Building on this insight, we leverage empirical scaling laws to develop a data-driven method for optimally splitting samples between the fine-tuning and rectification stages. Empirical analysis validates our framework, demonstrating improved estimation and inference performance compared to using either fine-tuning or rectification alone.
Related papers
- LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models [48.68246945083386]
Likelihood-Free Policy Optimization (LFPO) is a native framework that maps the concept of vector field flow matching to the discrete token space.<n>LFPO formulates alignment as geometric velocity rectification, which directly optimize denoising logits via contrastive updates.<n>Experiments demonstrate that LFPO not only outperforms state-of-the-art baselines on code and reasoning benchmarks but also accelerates inference by approximately 20% through reduced diffusion steps.
arXiv Detail & Related papers (2026-03-02T07:42:55Z) - Efficient Inference for Noisy LLM-as-a-Judge Evaluation [8.2511120576505]
Large language models (LLMs) are increasingly used as automatic evaluators of generative AI outputs.<n>In practice, LLM judges are imperfect predictions for the underlying truth and can exhibit systematic, non-random errors.
arXiv Detail & Related papers (2026-01-08T22:46:26Z) - Enhancing LLM Reasoning via Non-Human-Like Reasoning Path Preference Optimization [40.8414358896996]
We propose Confidence-Guided Reasoning Path Preference Optimization (CGPO)<n>CGPO applies self-generated, non-human-like reasoning-path guidance to mitigate trajectory drift.<n>We show that our method can achieve better performance in most cases compared with approaches using data generated by a strong model or human-annotated.
arXiv Detail & Related papers (2025-10-13T07:51:16Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [66.31417479052774]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>DMPO can consistently outperform or match existing techniques across different base models and test sets.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Leveraging Robust Optimization for LLM Alignment under Distribution Shifts [51.74394601039711]
Preference alignment methods are increasingly critical for steering large language models to generate outputs consistent with human values.<n>We propose a novel distribution-aware optimization framework that improves preference alignment despite such shifts.
arXiv Detail & Related papers (2025-04-08T09:14:38Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - Illuminating Blind Spots of Language Models with Targeted Agent-in-the-Loop Synthetic Data [9.982616173090264]
Language models (LMs) have achieved impressive accuracy across a variety of tasks but remain vulnerable to high-confidence misclassifications (UUs)
UUs cluster into blind spots in the feature space, leading to significant risks in high-stakes applications.
We propose a novel approach to address blind spot mitigation through the use of intelligent agents as teachers to characterize UU-type errors.
arXiv Detail & Related papers (2024-03-26T16:49:25Z) - Optimizing Language Models for Human Preferences is a Causal Inference Problem [41.59906798328058]
We present an initial exploration of language model optimization for human preferences from direct outcome datasets.
We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome.
We extend CPO with doubly robust CPO, which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias.
arXiv Detail & Related papers (2024-02-22T21:36:07Z) - Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models.
We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data.
Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.