Related papers: Can Smaller LLMs do better? Unlocking Cross-Domain Potential through Parameter-Efficient Fine-Tuning for Text Summarization

Can Smaller LLMs do better? Unlocking Cross-Domain Potential through Parameter-Efficient Fine-Tuning for Text Summarization

URL: http://arxiv.org/abs/2509.01314v1
Date: Mon, 01 Sep 2025 09:58:52 GMT
Title: Can Smaller LLMs do better? Unlocking Cross-Domain Potential through Parameter-Efficient Fine-Tuning for Text Summarization
Authors: Anum Afzal, Mehul Kumawat, Florian Matthes,
Abstract summary: We leverage parameter-efficient fine-tuning techniques (PEFTs) on high-resource datasets to improve performance on unseen low-resource domains.<n>We benchmark six PEFTs with textttLlama-3-8B-Instruct on 14 training datasets from the Scientific, Medical, Legal, and News domains.<n>Experiments show that for low-resource domains, inference using Within-Domain Adapters can achieve better performance than Few-Shot.
Score: 15.402666674186937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs), being generic task solvers, are versatile. However, despite the vast amount of data they are trained on, there are speculations about their adaptation capabilities to a new domain. Additionally, the simple fine-tuning of the model to incorporate knowledge of a new domain is computationally expensive and time-consuming. This becomes more challenging when the domain in question is also low-resource, and labeled data is unavailable. We leverage parameter-efficient fine-tuning techniques (PEFTs) on high-resource datasets to address these challenges to improve performance on unseen low-resource domains. Throughout our experiments, we evaluate whether intrinsic linguistic commonalities between datasets can be leveraged for efficient domain adaptation. We benchmark six PEFTs with \texttt{Llama-3-8B-Instruct} on 14 training datasets from the Scientific, Medical, Legal, and News domains for a Text Summarization task. Our experiments show that for low-resource domains, inference using Within-Domain Adapters can achieve better performance than Few-Shot as well as a much larger \texttt{Llama-3-70B-Instruct}. Lastly, in the absence of Within-Domain Adapters, we explore the concept of using Cross-Domain Adapters as well as the strategic combinations of adapters to leverage intrinsic language similarities across domains, facilitating better adaptability and performance in low-resource settings.

Related papers

Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision [0.13999481573773073]
Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning.<n>Fine-tuning smaller, task-specific models is a more efficient alternative but typically on scarce, manually labeled data.<n>We propose a novel pipeline that eliminates the need for human-labeled query-document pairs.
arXiv Detail & Related papers (2025-09-23T09:47:27Z)
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation [3.5939555573102857]
Supervised Fine-Tuning (SFT) is essential for training large language models (LLMs)<n>We propose SearchInstruct, a method explicitly designed to construct high quality instruction datasets for SFT.<n>Our approach begins with a limited set of domain specific, human generated questions, which are systematically expanded using a large language model.
arXiv Detail & Related papers (2025-09-12T21:50:39Z)
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z)
Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning [48.22913073217633]
Large language models (LLMs) have showcased their capability with few-shot inference known as in-context learning. In this paper, we study the UDA problem under an in-context learning setting to adapt language models from the source domain to the target domain without any target labels. We devise different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling.
arXiv Detail & Related papers (2023-11-20T06:06:20Z)
Adversarial Adaptation for French Named Entity Recognition [21.036698406367115]
We propose a Transformer-based NER approach for French, using adversarial adaptation to similar domain or general corpora. Our approach allows learning better features using large-scale unlabeled corpora from the same domain or mixed domains. We also show that adversarial adaptation to large-scale unlabeled corpora can help mitigate the performance dip incurred on using Transformer models pre-trained on smaller corpora.
arXiv Detail & Related papers (2023-01-12T18:58:36Z)
Combining Data Generation and Active Learning for Low-Resource Question Answering [23.755283239897132]
We propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low-resource settings. Our findings show that our novel approach, where humans are incorporated in a data generation approach, boosts performance in the low-resource, domain-specific setting.
arXiv Detail & Related papers (2022-11-27T16:31:33Z)
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation. We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z)
Data Augmentation for Cross-Domain Named Entity Recognition [22.66649873447105]
We study cross-domain data augmentation for the named entity recognition task. We propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain. We show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.
arXiv Detail & Related papers (2021-09-04T00:50:55Z)
Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing [85.35582118010608]
Task-oriented semantic parsing is a critical component of virtual assistants. Recent advances in deep learning have enabled several approaches to successfully parse more complex queries. We propose a novel method that outperforms a supervised neural model at a 10-fold data reduction.
arXiv Detail & Related papers (2020-10-07T17:47:53Z)
Latent Domain Learning with Dynamic Residual Adapters [26.018759356470767]
A practical shortcoming of deep neural networks is their specialization to a single task and domain. Here we focus on a less explored, but more realistic case: learning from data from multiple domains, without access to domain annotations. We address this limitation via dynamic residual adapters, an adaptive gating mechanism that helps account for latent domains.
arXiv Detail & Related papers (2020-06-01T15:00:11Z)
Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation [80.40677540516616]
We show that when in-domain parallel data is not available, access to document-level context enables better capturing of domain generalities. We present two document-level Transformer models which are capable of using large context sizes.
arXiv Detail & Related papers (2020-04-30T16:28:19Z)
Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them. We propose a generic framework based on graph embedding. We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z)
Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [77.62366712130196]
We present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset. Our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.
arXiv Detail & Related papers (2020-03-03T18:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.