ImitAL: Learning Active Learning Strategies from Synthetic Data
- URL: http://arxiv.org/abs/2108.07670v1
- Date: Tue, 17 Aug 2021 15:03:31 GMT
- Title: ImitAL: Learning Active Learning Strategies from Synthetic Data
- Authors: Julius Gonsior, Maik Thiele, Wolfgang Lehner
- Abstract summary: Active Learning is a well-known standard method for efficiently obtaining labeled data.
We propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem.
We show that our approach is more runtime performant than most other strategies, especially on very large datasets.
- Score: 14.758287202278918
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: One of the biggest challenges that complicates applied supervised machine
learning is the need for huge amounts of labeled data. Active Learning (AL) is
a well-known standard method for efficiently obtaining labeled data by first
labeling the samples that contain the most information based on a query
strategy. Although many methods for query strategies have been proposed in the
past, no clear superior method that works well in general for all domains has
been found yet. Additionally, many strategies are computationally expensive
which further hinders the widespread use of AL for large-scale annotation
projects.
We, therefore, propose ImitAL, a novel query strategy, which encodes AL as a
learning-to-rank problem. For training the underlying neural network we chose
Imitation Learning. The required demonstrative expert experience for training
is generated from purely synthetic data.
To show the general and superior applicability of \ImitAL{}, we perform an
extensive evaluation comparing our strategy on 15 different datasets, from a
wide range of domains, with 10 different state-of-the-art query strategies. We
also show that our approach is more runtime performant than most other
strategies, especially on very large datasets.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - ALE: A Simulation-Based Active Learning Evaluation Framework for the
Parameter-Driven Comparison of Query Strategies for NLP [3.024761040393842]
Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample.
This method is supposed to save annotation effort while maintaining model performance.
We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
arXiv Detail & Related papers (2023-08-01T10:42:11Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain.
We use knowledge distillation to guide the model's embeddings to a semantically meaningful space.
Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z) - ImitAL: Learned Active Learning Strategy on Synthetic Data [30.595138995552748]
We propose ImitAL, a domain-independent novel query strategy, which encodes AL as a learning-to-rank problem.
We train ImitAL on large-scale simulated AL runs on purely synthetic datasets.
To show that ImitAL was successfully trained, we perform an extensive evaluation comparing our strategy on 13 different datasets.
arXiv Detail & Related papers (2022-08-24T16:17:53Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Learning active learning at the crossroads? evaluation and discussion [0.03807314298073299]
Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label.
There is no best active learning strategy that consistently outperforms all others in all applications.
We present the results of a benchmark performed on 20 datasets that compares a strategy learned using a recent meta-learning algorithm with margin sampling.
arXiv Detail & Related papers (2020-12-16T10:35:43Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.