Related papers: Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models

Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models

URL: http://arxiv.org/abs/2506.07424v1
Date: Mon, 09 Jun 2025 04:45:13 GMT
Title: Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Authors: Kyeonghyun Kim, Jinhee Jang, Juhwan Choi, Yoonji Lee, Kyohoon Jin, YoungBin Kim,
Abstract summary: Large language models (LLMs) are renowned for their extensive linguistic knowledge and strong generalization capabilities.<n>Small language models (SLMs) are computationally efficient but often lack the broad generalization capacity of LLMs.<n>We propose PiFi, a novel framework that combines the strengths of both LLMs and SLMs to achieve high performance.
Score: 9.50875832714468
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are renowned for their extensive linguistic knowledge and strong generalization capabilities, but their high computational demands make them unsuitable for resource-constrained environments. In contrast, small language models (SLMs) are computationally efficient but often lack the broad generalization capacity of LLMs. To bridge this gap, we propose PiFi, a novel framework that combines the strengths of both LLMs and SLMs to achieve high performance while maintaining efficiency. PiFi integrates a single frozen layer from an LLM into a SLM and fine-tunes the combined model for specific tasks, boosting performance without a significant increase in computational cost. We show that PiFi delivers consistent performance improvements across a range of natural language processing tasks, including both natural language understanding and generation. Moreover, our findings demonstrate PiFi's ability to effectively leverage LLM knowledge, enhancing generalization to unseen domains and facilitating the transfer of linguistic abilities.

Related papers

Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models [26.305881774348844]
multimodal large language models (MLLMs) have rapidly advanced in their ability to process data across multiple modalities.<n>In vision-language models, aligning modalities using only high-level visual features often discards the rich semantic information present in mid- and low-level features.<n>We propose SparseCut, a general cross-modal fusion architecture for MLLMs.
arXiv Detail & Related papers (2026-01-31T04:15:42Z)
BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion [6.8723394189831035]
Large language models pose challenges for deployment in resource-constrained environments.<n>We propose a lightweight MLLM framework for end-to-end visual question answering.<n>Our proposed approach centres on BreezeCLIP, a compact yet powerful vision-language optimised for efficient multimodal understanding.
arXiv Detail & Related papers (2025-09-10T16:09:49Z)
Dense Communication between Language Models [41.631010796783706]
This paper proposes a novel paradigm of direct dense vector communication between Large Language Models (LLMs)<n>Our approach eliminates the unnecessary embedding and de-embedding steps when LLM interact with another, enabling more efficient information transfer.<n>We train a LMNet that achieves comparable performance with LLMs in similar size with only less than 0.1% training cost.
arXiv Detail & Related papers (2025-05-19T05:56:06Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation [72.02635550088546]
This work explores how large language models (LLMs) can enhance CLIP's capability, especially for processing longer and more complex image captions.<n>We introduce a caption-to-caption contrastive fine-tuning framework, significantly enhancing the discriminative quality of LLM outputs.<n>Our approach outperforms LoRA-based methods, achieving nearly fourfold faster training with superior performance.
arXiv Detail & Related papers (2024-11-07T18:59:16Z)
Improving In-Context Learning with Small Language Model Ensembles [2.3499129784547654]
In-context learning (ICL) is a cheap and efficient alternative but cannot match the accuracies of advanced methods.<n>We present Ensemble SuperICL, a novel approach that enhances ICL by leveraging the expertise of multiple fine-tuned small language models (SLMs)
arXiv Detail & Related papers (2024-10-29T09:02:37Z)
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [72.68665884790002]
We propose a novel framework to transfer knowledge from l-MLLMs to s-MLLMs.<n>We introduce Multimodal Distillation (MDist) to transfer teacher model's robust representations across both visual and linguistic modalities.<n>We also propose a three-stage training scheme to fully exploit the potential of the proposed distillation strategy.
arXiv Detail & Related papers (2024-10-21T17:41:28Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities. Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data. We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.