How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
- URL: http://arxiv.org/abs/2511.20605v1
- Date: Tue, 25 Nov 2025 18:34:33 GMT
- Title: How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
- Authors: Xiwen Huang, Pierre Pinson,
- Abstract summary: We introduce and analyse active learning markets as a way to purchase labels.<n>This comes in contrast to the many proposals that already exist to purchase features and examples.<n>We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies.
- Score: 2.7429630700600893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to a benchmark random sampling approach. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.
Related papers
- SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Transfer Learning for Nonparametric Contextual Dynamic Pricing [17.420508136662257]
Dynamic pricing strategies are crucial for firms to maximize revenue by adjusting prices based on market conditions and customer characteristics.<n>One promising approach to overcome this limitation is to leverage information from related products or markets to inform the focal pricing decisions.<n>We propose a novel Transfer Learning for Dynamic Pricing (TLDP) algorithm that can effectively leverage pre-collected data from a source domain to enhance pricing decisions in the target domain.
arXiv Detail & Related papers (2025-01-31T01:05:04Z) - Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.<n>Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.<n>We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning [70.22819290458581]
Reinforcement learning with human feedback (RLHF) is a widely adopted approach in current large language model pipelines.
Our approach introduces two key innovations: (1) on-policy query to avoid OOD and imbalance issues in seed data, and (2) active learning to select the most informative data for preference queries.
arXiv Detail & Related papers (2024-07-02T10:09:19Z) - Improve Cost Efficiency of Active Learning over Noisy Dataset [1.3846014191157405]
In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances.
We propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling.
Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.
arXiv Detail & Related papers (2024-03-02T23:53:24Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - How Many Validation Labels Do You Need? Exploring the Design Space of
Label-Efficient Model Ranking [40.39898960460575]
This paper presents LEMR (Label-Efficient Model Ranking) and introduces the MoraBench Benchmark.
LEMR is a novel framework that minimizes the need for costly annotations in model selection by strategically annotating instances from an unlabeled validation set.
arXiv Detail & Related papers (2023-12-04T04:20:38Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Markdowns in E-Commerce Fresh Retail: A Counterfactual Prediction and
Multi-Period Optimization Approach [29.11201102550876]
We build a semi-parametric structural model to learn individual price elasticity and predict counterfactual demand.
We propose a multi-period dynamic pricing algorithm to maximize the overall profit of a perishable product over its finite selling horizon.
The proposed framework has been successfully deployed to the well-known e-commerce fresh retail scenario - Freshippo.
arXiv Detail & Related papers (2021-05-18T07:01:37Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Toward Optimal Probabilistic Active Learning Using a Bayesian Approach [4.380488084997317]
Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling resources.
By reformulating existing selection strategies within our proposed model, we can explain which aspects are not covered in current state-of-the-art.
arXiv Detail & Related papers (2020-06-02T15:59:42Z) - Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems.
Previous works ignore the losing auctions to alleviate the difficulty with censored states.
We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.