ImitAL: Learned Active Learning Strategy on Synthetic Data
- URL: http://arxiv.org/abs/2208.11636v1
- Date: Wed, 24 Aug 2022 16:17:53 GMT
- Title: ImitAL: Learned Active Learning Strategy on Synthetic Data
- Authors: Julius Gonsior, Maik Thiele, Wolfgang Lehner
- Abstract summary: We propose ImitAL, a domain-independent novel query strategy, which encodes AL as a learning-to-rank problem.
We train ImitAL on large-scale simulated AL runs on purely synthetic datasets.
To show that ImitAL was successfully trained, we perform an extensive evaluation comparing our strategy on 13 different datasets.
- Score: 30.595138995552748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active Learning (AL) is a well-known standard method for efficiently
obtaining annotated data by first labeling the samples that contain the most
information based on a query strategy. In the past, a large variety of such
query strategies has been proposed, with each generation of new strategies
increasing the runtime and adding more complexity. However, to the best of our
our knowledge, none of these strategies excels consistently over a large number
of datasets from different application domains. Basically, most of the the
existing AL strategies are a combination of the two simple heuristics
informativeness and representativeness, and the big differences lie in the
combination of the often conflicting heuristics. Within this paper, we propose
ImitAL, a domain-independent novel query strategy, which encodes AL as a
learning-to-rank problem and learns an optimal combination between both
heuristics. We train ImitAL on large-scale simulated AL runs on purely
synthetic datasets. To show that ImitAL was successfully trained, we perform an
extensive evaluation comparing our strategy on 13 different datasets, from a
wide range of domains, with 7 other query strategies.
Related papers
- AutoAL: Automated Active Learning with Differentiable Query Strategy Search [18.23964720426325]
This work presents the first differentiable active learning strategy search method, named AutoAL.
For any given task, SearchNet and FitNet are iteratively co-optimized using the labeled data, learning how well a set of candidate AL algorithms perform on that task.
AutoAL consistently achieves superior accuracy compared to all candidate AL algorithms and other selective AL approaches.
arXiv Detail & Related papers (2024-10-17T17:59:09Z) - Balancing Cost and Effectiveness of Synthetic Data Generation Strategies for LLMs [0.6700983301090584]
Large language models (LLMs) are applied to more use cases, creating high quality, task-specific datasets for fine-tuning.
Using high quality human data has been the most common approach to unlock model performance, but is prohibitively expensive in many scenarios.
Several alternative methods have also emerged, such as generating synthetic or hybrid data, but the effectiveness of these approaches remain unclear.
arXiv Detail & Related papers (2024-09-29T20:14:50Z) - Towards a Unified View of Preference Learning for Large Language Models: A Survey [88.66719962576005]
Large Language Models (LLMs) exhibit remarkably powerful capabilities.
One of the crucial factors to achieve success is aligning the LLM's output with human preferences.
We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
arXiv Detail & Related papers (2024-09-04T15:11:55Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Alice Benchmarks: Connecting Real World Re-Identification with the
Synthetic [92.02220105679713]
We introduce the Alice benchmarks, large-scale datasets providing benchmarks and evaluation protocols to the research community.
Within the Alice benchmarks, two object re-ID tasks are offered: person and vehicle re-ID.
As an important feature of our real target, the clusterability of its training set is not manually guaranteed to make it closer to a real domain adaptation test scenario.
arXiv Detail & Related papers (2023-10-06T17:58:26Z) - ALE: A Simulation-Based Active Learning Evaluation Framework for the
Parameter-Driven Comparison of Query Strategies for NLP [3.024761040393842]
Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample.
This method is supposed to save annotation effort while maintaining model performance.
We introduce a reproducible active learning evaluation framework for the comparative evaluation of AL strategies in NLP.
arXiv Detail & Related papers (2023-08-01T10:42:11Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - ImitAL: Learning Active Learning Strategies from Synthetic Data [14.758287202278918]
Active Learning is a well-known standard method for efficiently obtaining labeled data.
We propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem.
We show that our approach is more runtime performant than most other strategies, especially on very large datasets.
arXiv Detail & Related papers (2021-08-17T15:03:31Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.