Related papers: Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries

Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries

URL: http://arxiv.org/abs/2602.15738v1
Date: Tue, 17 Feb 2026 17:14:15 GMT
Title: Beyond Labels: Information-Efficient Human-in-the-Loop Learning using Ranking and Selection Queries
Authors: Belén Martín-Urcelay, Yoonsang Lee, Matthieu R. Bloch, Christopher J. Rozell,
Abstract summary: We develop a human-in-the-loop framework to learn binary classifiers with rich query types.<n>We then design active learning algorithms that leverage the rich queries to increase the information gained per interaction.<n>This algorithm in the word sentiment classification task reduces learning time by more than 57% compared to traditional label-only active learning.
Score: 10.128513811628201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integrating human expertise into machine learning systems often reduces the role of experts to labeling oracles, a paradigm that limits the amount of information exchanged and fails to capture the nuances of human judgment. We address this challenge by developing a human-in-the-loop framework to learn binary classifiers with rich query types, consisting of item ranking and exemplar selection. We first introduce probabilistic human response models for these rich queries motivated by the relationship experimentally observed between the perceived implicit score of an item and its distance to the unknown classifier. Using these models, we then design active learning algorithms that leverage the rich queries to increase the information gained per interaction. We provide theoretical bounds on sample complexity and develop a tractable and computationally efficient variational approximation. Through experiments with simulated annotators derived from crowdsourced word-sentiment and image-aesthetic datasets, we demonstrate significant reductions on sample complexity. We further extend active learning strategies to select queries that maximize information rate, explicitly balancing informational value against annotation cost. This algorithm in the word sentiment classification task reduces learning time by more than 57\% compared to traditional label-only active learning.

Related papers

Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification [6.662167018900634]
Our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework. It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations. Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification.
arXiv Detail & Related papers (2024-01-15T00:06:24Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.<n>XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.<n>Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans. We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information. We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z)
Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
FairCVtest Demo: Understanding Bias in Multimodal Learning with a Testbed in Fair Automatic Recruitment [79.23531577235887]
This demo shows the capacity of the Artificial Intelligence (AI) behind a recruitment tool to extract sensitive information from unstructured data. Aditionally, the demo includes a new algorithm for discrimination-aware learning which eliminates sensitive information in our multimodal AI framework.
arXiv Detail & Related papers (2020-09-12T17:45:09Z)
On the Robustness of Active Learning [0.7340017786387767]
Active Learning is concerned with how to identify the most useful samples for a Machine Learning algorithm to be trained with. We find that it is often applied with not enough care and domain knowledge. We propose the new "Sum of Squared Logits" method based on the Simpson diversity index and investigate the effect of using the confusion matrix for balancing in sample selection.
arXiv Detail & Related papers (2020-06-18T09:07:23Z)
Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.