Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review
- URL: http://arxiv.org/abs/2105.01044v1
- Date: Mon, 3 May 2021 17:41:18 GMT
- Title: Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review
- Authors: Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder
- Abstract summary: Technology-assisted review (TAR) refers to iterative active learning for document review in high recall retrieval tasks.
Transformer-based models with supervised tuning have been found to improve effectiveness on many text classification tasks.
We show that just-right language model fine-tuning on the task collection before starting active learning is critical.
- Score: 14.689883695115519
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Technology-assisted review (TAR) refers to iterative active learning
workflows for document review in high recall retrieval (HRR) tasks. TAR
research and most commercial TAR software have applied linear models such as
logistic regression or support vector machines to lexical features.
Transformer-based models with supervised tuning have been found to improve
effectiveness on many text classification tasks, suggesting their use in TAR.
We indeed find that the pre-trained BERT model reduces review volume by 30% in
TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we
find that linear models outperform BERT for simulated legal discovery topics on
the Jeb Bush e-mail collection. This suggests the match between transformer
pre-training corpora and the task domain is more important than generally
appreciated. Additionally, we show that just-right language model fine-tuning
on the task collection before starting active learning is critical. Both too
little or too much fine-tuning results in performance worse than that of linear
models, even for RCV1-v2.
Related papers
- REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical.
It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance.
We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z) - Contextualization with SPLADE for High Recall Retrieval [5.973857434357868]
High Recall Retrieval (HRR) is a search problem that optimize the cost of retrieving most relevant documents in a given collection.
In this work, we leverage SPLADE, an efficient retrieval model that transforms documents into contextualized sparse vectors.
It reduces 10% and 18% of the review cost in two HRR evaluation collections under a one-phase review workflow with a target recall of 80%.
arXiv Detail & Related papers (2024-05-07T03:05:37Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Extensive Evaluation of Transformer-based Architectures for Adverse Drug
Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance.
We evaluate 19 Transformer-based models for ADE extraction on informal texts.
At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z) - Gradient-Based Adversarial Training on Transformer Networks for
Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model.
We obtain a 4.70 point F1-score improvement over current state-of-the-art models.
We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.