Related papers: Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

URL: http://arxiv.org/abs/2301.11660v4
Date: Wed, 14 Jun 2023 03:12:50 GMT
Title: Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning
Authors: Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, and Sang-goo Lee
Abstract summary: In this study, we explore how the ability to detect out-of-distribution changes as the size of the PLM grows or the transfer methods are altered. We evaluate various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks.
Score: 17.110208720745064
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various language models with different scales.

Related papers

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models [79.90523648823522]
Multi-stage continual learning can lead to catastrophic forgetting.<n>This paper evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay.<n>Results show that experience replay is the most effective, with further gains achieved by combining it with other methods.
arXiv Detail & Related papers (2025-05-23T05:50:14Z)
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models [1.2263658159556594]
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. We propose Stratified Progressive Adaptation Fine-tuning (SPAFIT) based on the localization of different types of linguistic knowledge. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods.
arXiv Detail & Related papers (2024-04-30T21:07:32Z)
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z)
Tokenizer Choice For LLM Training: Negligible or Crucial? [30.33170936148845]
We study the influence of tokenizer choice on Large Language Models (LLMs) downstream performance by training 24 mono- and multilingual LLMs. We find that the tokenizer choice can significantly impact the model's downstream performance and training costs. We show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English.
arXiv Detail & Related papers (2023-10-12T22:44:19Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z)
Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
Investigating Transferability in Pretrained Language Models [8.83046338075119]
We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks.
arXiv Detail & Related papers (2020-04-30T17:23:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.