Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates
- URL: http://arxiv.org/abs/2101.08133v2
- Date: Thu, 18 Feb 2021 15:50:11 GMT
- Title: Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates
- Authors: Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov,
Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V.
Dylov, and Alexander Panchenko
- Abstract summary: Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
- Score: 52.164757178369804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Annotating training data for sequence tagging of texts is usually very
time-consuming. Recent advances in transfer learning for natural language
processing in conjunction with active learning open the possibility to
significantly reduce the necessary annotation budget. We are the first to
thoroughly investigate this powerful combination for the sequence tagging task.
We conduct an extensive empirical study of various Bayesian uncertainty
estimation methods and Monte Carlo dropout options for deep pre-trained models
in the active learning framework and find the best combinations for different
types of models. Besides, we also demonstrate that to acquire instances during
active learning, a full-size Transformer can be substituted with a distilled
version, which yields better computational performance and reduces obstacles
for applying deep active learning in practice.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating
True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL)
It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation.
We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z) - Batch Active Learning from the Perspective of Sparse Approximation [12.51958241746014]
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators.
We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective.
Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart.
arXiv Detail & Related papers (2022-11-01T03:20:28Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning.
We tackle this issue by using an approach inspired by transfer learning.
We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z) - BALanCe: Deep Bayesian Active Learning via Equivalence Class Annealing [7.9107076476763885]
BALanCe is a deep active learning framework that mitigates the effect of uncertainty estimates.
Batch-BALanCe is a generalization of the sequential algorithm to the batched setting.
We show that Batch-BALanCe achieves state-of-the-art performance on several benchmark datasets for active learning.
arXiv Detail & Related papers (2021-12-27T15:38:27Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.