Related papers: One-Round Active Learning

One-Round Active Learning

URL: http://arxiv.org/abs/2104.11843v1
Date: Fri, 23 Apr 2021 23:59:50 GMT
Title: One-Round Active Learning
Authors: Tianhao Wang, Si Chen, Ruoxi Jia
Abstract summary: One-round active learning aims to select a subset of unlabeled data points that achieve the highest utility after being labeled. We propose DULO, a general framework for one-round active learning based on the notion of data utility functions. Our results demonstrate that while existing active learning approaches could succeed with multiple rounds, DULO consistently performs better in the one-round setting.
Score: 13.25385227263705
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Active learning has been a main solution for reducing data labeling costs. However, existing active learning strategies assume that a data owner can interact with annotators in an online, timely manner, which is usually impractical. Even with such interactive annotators, for existing active learning strategies to be effective, they often require many rounds of interactions between the data owner and annotators, which is often time-consuming. In this work, we initiate the study of one-round active learning, which aims to select a subset of unlabeled data points that achieve the highest utility after being labeled with only the information from initially labeled data points. We propose DULO, a general framework for one-round active learning based on the notion of data utility functions, which map a set of data points to some performance measure of the model trained on the set. We formulate the one-round active learning problem as data utility function maximization. We further propose strategies to make the estimation and optimization of data utility functions scalable to large models and large unlabeled data sets. Our results demonstrate that while existing active learning approaches could succeed with multiple rounds, DULO consistently performs better in the one-round setting.

Related papers

Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems [0.6267574471145215]
We propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. Our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.
arXiv Detail & Related papers (2024-12-31T05:12:51Z)
LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics. The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
Zero-shot Active Learning Using Self Supervised Learning [11.28415437676582]
We propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning.
arXiv Detail & Related papers (2024-01-03T11:49:07Z)
BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data. Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z)
Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z)
Active learning for data streams: a survey [0.48951183832371004]
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time.
arXiv Detail & Related papers (2023-02-17T14:24:13Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z)
Active metric learning and classification using similarity queries [21.589707834542338]
We show that a novel unified query framework can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification.
arXiv Detail & Related papers (2022-02-04T03:34:29Z)
Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning. I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z)
Data Shapley Valuation for Efficient Batch Active Learning [21.76249748709411]
Active Data Shapley (ADS) is a filtering layer for batch active learning. We show that ADS is particularly effective when the pool of unlabeled data exhibits real-world caveats.
arXiv Detail & Related papers (2021-04-16T18:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.