Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT
- URL: http://arxiv.org/abs/2412.12674v1
- Date: Tue, 17 Dec 2024 08:44:00 GMT
- Title: Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT
- Authors: Jenny Kunz,
- Abstract summary: We aim to enhance the generation performance of an LLM by specialising it using unstructured text corpora.
We find that increasing the number of trainable parameters leads to better and more robust language adaptation.
Although improvements are consistent in 0-shot summarisation, some adapted models struggle with longer context lengths.
- Score: 0.8702432681310401
- License:
- Abstract: Smaller LLMs still face significant challenges even in medium-resourced languages, particularly when it comes to language-specific knowledge -- a problem not easily resolved with machine-translated data. In this case study on Icelandic, we aim to enhance the generation performance of an LLM by specialising it using unstructured text corpora. A key focus is on preventing interference with the models' capabilities of handling longer context during this adaptation. Through ablation studies using various parameter-efficient fine-tuning (PEFT) methods and setups, we find that increasing the number of trainable parameters leads to better and more robust language adaptation. LoRAs placed in the feed-forward layers and bottleneck adapters show promising results with sufficient parameters, while prefix tuning and (IA)3 are not suitable. Although improvements are consistent in 0-shot summarisation, some adapted models struggle with longer context lengths, an issue that can be mitigated by adapting only the final layers.
Related papers
- Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data.
Current state-of-the-art large language models (LLMs) still struggle with LRLs.
Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z) - How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)
In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.
We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z) - LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Found in the Middle: How Language Models Use Long Contexts Better via
Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional.
(Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of.
LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - Towards Parameter-Efficient Integration of Pre-Trained Language Models
In Temporal Video Grounding [37.199310579532884]
This paper explores the task of Temporal Video Grounding (TVG)
TVG is where, given an untrimmed video and a natural language sentence query, the goal is to recognize and determine temporal boundaries of action instances in the video.
Recent works tackled this task by improving query inputs with large pre-trained language models (PLM) at the cost of more expensive training.
arXiv Detail & Related papers (2022-09-26T08:11:19Z) - Meta-Learning the Difference: Preparing Large Language Models for
Efficient Adaptation [11.960178399478718]
Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting.
Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs.
arXiv Detail & Related papers (2022-07-07T18:00:22Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.