One-Round Active Learning
- URL: http://arxiv.org/abs/2104.11843v1
- Date: Fri, 23 Apr 2021 23:59:50 GMT
- Title: One-Round Active Learning
- Authors: Tianhao Wang, Si Chen, Ruoxi Jia
- Abstract summary: One-round active learning aims to select a subset of unlabeled data points that achieve the highest utility after being labeled.
We propose DULO, a general framework for one-round active learning based on the notion of data utility functions.
Our results demonstrate that while existing active learning approaches could succeed with multiple rounds, DULO consistently performs better in the one-round setting.
- Score: 13.25385227263705
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Active learning has been a main solution for reducing data labeling costs.
However, existing active learning strategies assume that a data owner can
interact with annotators in an online, timely manner, which is usually
impractical. Even with such interactive annotators, for existing active
learning strategies to be effective, they often require many rounds of
interactions between the data owner and annotators, which is often
time-consuming. In this work, we initiate the study of one-round active
learning, which aims to select a subset of unlabeled data points that achieve
the highest utility after being labeled with only the information from
initially labeled data points. We propose DULO, a general framework for
one-round active learning based on the notion of data utility functions, which
map a set of data points to some performance measure of the model trained on
the set. We formulate the one-round active learning problem as data utility
function maximization. We further propose strategies to make the estimation and
optimization of data utility functions scalable to large models and large
unlabeled data sets. Our results demonstrate that while existing active
learning approaches could succeed with multiple rounds, DULO consistently
performs better in the one-round setting.
Related papers
- LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics.
The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Zero-shot Active Learning Using Self Supervised Learning [11.28415437676582]
We propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process.
We aim to leverage self-supervised learnt features for the task of Active Learning.
arXiv Detail & Related papers (2024-01-03T11:49:07Z) - Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition.
A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z) - Active learning for data streams: a survey [0.48951183832371004]
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream.
Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data.
This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time.
arXiv Detail & Related papers (2023-02-17T14:24:13Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z) - Active metric learning and classification using similarity queries [21.589707834542338]
We show that a novel unified query framework can be applied to any problem in which a key component is learning a representation of the data that reflects similarity.
We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification.
arXiv Detail & Related papers (2022-02-04T03:34:29Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Data Shapley Valuation for Efficient Batch Active Learning [21.76249748709411]
Active Data Shapley (ADS) is a filtering layer for batch active learning.
We show that ADS is particularly effective when the pool of unlabeled data exhibits real-world caveats.
arXiv Detail & Related papers (2021-04-16T18:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.