Sparse*BERT: Sparse Models Generalize To New tasks and Domains
- URL: http://arxiv.org/abs/2205.12452v3
- Date: Wed, 5 Apr 2023 19:54:59 GMT
- Title: Sparse*BERT: Sparse Models Generalize To New tasks and Domains
- Authors: Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, and
ChengXiang Zhai
- Abstract summary: This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks.
We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text.
- Score: 79.42527716035879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models have become the core architecture upon which most
modern natural language processing (NLP) systems build. These models can
consistently deliver impressive accuracy and robustness across tasks and
domains, but their high computational overhead can make inference difficult and
expensive. To make using these models less costly, recent work has explored
leveraging structured and unstructured pruning, quantization, and distillation
to improve inference speed and decrease size. This paper studies how models
pruned using Gradual Unstructured Magnitude Pruning can transfer between
domains and tasks. Our experimentation shows that models that are pruned during
pretraining using general domain masked language models can transfer to novel
domains and tasks without extensive hyperparameter exploration or specialized
approaches. We demonstrate that our general sparse model Sparse*BERT can become
SparseBioBERT simply by pretraining the compressed architecture on unstructured
biomedical text. Moreover, we show that SparseBioBERT can match the quality of
BioBERT with only 10\% of the parameters.
Related papers
- Structural Pruning of Pre-trained Language Models via Neural Architecture Search [7.833790713816726]
Pre-trained language models (PLM) mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data.
This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:57Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - TextPruner: A Model Pruning Toolkit for Pre-Trained Language Models [18.49325959450621]
We introduce TextPruner, an open-source model pruning toolkit for pre-trained language models.
TextPruner offers structured post-training pruning methods, including vocabulary pruning and transformer pruning.
Our experiments with several NLP tasks demonstrate the ability of TextPruner to reduce the model size without re-training the model.
arXiv Detail & Related papers (2022-03-30T02:10:33Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Adapt-and-Distill: Developing Small, Fast and Effective Pretrained
Language Models for Domains [45.07506437436464]
We present a general approach to developing small, fast and effective pre-trained models for specific domains.
This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
arXiv Detail & Related papers (2021-06-25T07:37:05Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.