Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs
- URL: http://arxiv.org/abs/2601.13458v1
- Date: Mon, 19 Jan 2026 23:23:29 GMT
- Title: Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs
- Authors: Zihan Dong, Ruijia Wu, Linjun Zhang,
- Abstract summary: We show how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI.<n>We introduce Preference-Calibrated Active Learning (PCAL), a novel robustness method that learns optimal data acquisition strategy.<n>This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI.
- Score: 17.028710603629026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator's variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.
Related papers
- Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Closing the Generalization Gap in Parameter-efficient Federated Edge Learning [43.00634399799955]
Federated edge learning (FEEL) provides a promising foundation for artificial intelligence (AI)<n>limited and heterogeneous local datasets, as well as resource-constrained deployment, severely degrade both model generalization and resource utilization.<n>We propose a framework that jointly leverages model minimization and generalization selection to tackle such challenges.
arXiv Detail & Related papers (2025-11-28T15:34:09Z) - Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z) - Cost-Optimal Active AI Model Evaluation [71.2069549142394]
Development of generative AI systems requires continual evaluation, data acquisition, and annotation.<n>We develop novel, cost-aware methods for actively balancing the use of a cheap, but often inaccurate, weak rater.<n>We derive a family of cost-optimal policies for allocating a given annotation budget between weak and strong raters.
arXiv Detail & Related papers (2025-06-09T17:14:41Z) - LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency [9.32396532269159]
This paper proposes a offLine prEference-bAsed RL with high Sample Efficiency (LEASE) algorithm to generate unlabeled preference data.<n>Considering the pretrained reward model may generate incorrect labels for unlabeled data, we design an uncertainty-aware mechanism to ensure the performance of reward model.
arXiv Detail & Related papers (2024-12-30T15:10:57Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.<n>We propose a method called Stratified Prediction-Powered Inference (StratPPI)<n>We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Stochastic Methods for AUC Optimization subject to AUC-based Fairness
Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints.
We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.