A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
- URL: http://arxiv.org/abs/2210.05211v1
- Date: Tue, 11 Oct 2022 07:26:34 GMT
- Title: A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
- Authors: Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li, Peng Fu, Yanan Cao,
Weiping Wang, Jie Zhou
- Abstract summary: Large-scale language models (PLMs) are inefficient in terms of memory footprint and computation.
PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data.
Recent studies show that sparseworks can be replaced with sparseworks without hurting the performance.
- Score: 53.87983344862402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable success of pre-trained language models (PLMs), they
still face two challenges: First, large-scale PLMs are inefficient in terms of
memory footprint and computation. Second, on the downstream tasks, PLMs tend to
rely on the dataset bias and struggle to generalize to out-of-distribution
(OOD) data. In response to the efficiency problem, recent studies show that
dense PLMs can be replaced with sparse subnetworks without hurting the
performance. Such subnetworks can be found in three scenarios: 1) the
fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even
inside 3) PLMs without any parameter fine-tuning. However, these results are
only obtained in the in-distribution (ID) setting. In this paper, we extend the
study on PLMs subnetworks to the OOD setting, investigating whether sparsity
and robustness to dataset bias can be achieved simultaneously. To this end, we
conduct extensive experiments with the pre-trained BERT model on three natural
language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse
and robust subnetworks (SRNets) can consistently be found in BERT}, across the
aforementioned three scenarios, using different training and compression
methods. Furthermore, we explore the upper bound of SRNets using the OOD
information and show that \textbf{there exist sparse and almost unbiased BERT
subnetworks}. Finally, we present 1) an analytical study that provides insights
on how to promote the efficiency of SRNets searching process and 2) a solution
to improve subnetworks' performance at high sparsity. The code is available at
https://github.com/llyx97/sparse-and-robust-PLM.
Related papers
- Sample-Efficient Alignment for LLMs [29.477421976548015]
We study methods for efficiently aligning large language models (LLMs) with human preferences given budgeted online feedback.
We introduce a unified algorithm based on Thompson sampling and highlight its applications in two distinct LLM alignment scenarios.
The results demonstrate that SEA achieves highly sample-efficient alignment with oracle's preferences, outperforming recent active exploration methods for LLMs.
arXiv Detail & Related papers (2024-11-03T09:18:28Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
Question Answering [25.540831728925557]
This paper investigates whether a vision-language pre-trained model can be compressed and debiased simultaneously by searching sparse and robustworks.
Our results show that there indeed exist sparse and robustworks, which are competitive with the debiased full.
vehicle.
arXiv Detail & Related papers (2022-10-26T08:25:03Z) - Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask
Training [55.43088293183165]
Recent studies show that pre-trained language models (PLMs) like BERT contain matchingworks that have similar transfer learning performance as the original PLM.
In this paper, we find that the BERTworks have even more potential than these studies have shown.
We train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork.
arXiv Detail & Related papers (2022-04-24T08:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.