Towards Efficient Active Learning in NLP via Pretrained Representations
- URL: http://arxiv.org/abs/2402.15613v1
- Date: Fri, 23 Feb 2024 21:28:59 GMT
- Title: Towards Efficient Active Learning in NLP via Pretrained Representations
- Authors: Artem Vysogorets, Achintya Gopal
- Abstract summary: Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications.
We drastically expedite this process by using pretrained representations of LLMs within the active learning loop.
Our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive.
- Score: 1.90365714903665
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-tuning Large Language Models (LLMs) is now a common approach for text
classification in a wide range of applications. When labeled documents are
scarce, active learning helps save annotation efforts but requires retraining
of massive models on each acquisition iteration. We drastically expedite this
process by using pretrained representations of LLMs within the active learning
loop and, once the desired amount of labeled data is acquired, fine-tuning that
or even a different pretrained LLM on this labeled data to achieve the best
performance. As verified on common text classification benchmarks with
pretrained BERT and RoBERTa as the backbone, our strategy yields similar
performance to fine-tuning all the way through the active learning loop but is
orders of magnitude less computationally expensive. The data acquired with our
procedure generalizes across pretrained networks, allowing flexibility in
choosing the final model or updating it as newer versions get released.
Related papers
- STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning [1.9116784879310025]
We present STENCIL, which improves overall accuracy by $10%-18%$ and rare-class F-1 score by $17%-40%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.
We show that STENCIL improves overall accuracy by $10%-18%$ and rare-class F-1 score by $17%-40%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.
arXiv Detail & Related papers (2024-02-21T01:54:58Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently.
We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction.
With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - Active Learning Guided by Efficient Surrogate Learners [25.52920030051264]
Re-training a deep learning model each time a single data point receives a new label is impractical.
We introduce a new active learning algorithm that harnesses the power of a Gaussian process surrogate in conjunction with the neural network principal learner.
Our proposed model adeptly updates the surrogate learner for every new data instance, enabling it to emulate and capitalize on the continuous learning dynamics of the neural network.
arXiv Detail & Related papers (2023-01-07T01:35:25Z) - An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain.
We use knowledge distillation to guide the model's embeddings to a semantically meaningful space.
Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z) - Active Transfer Prototypical Network: An Efficient Labeling Algorithm
for Time-Series Data [1.7205106391379026]
This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations.
This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset.
The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
arXiv Detail & Related papers (2022-09-28T16:14:40Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.