In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models
- URL: http://arxiv.org/abs/2503.01611v2
- Date: Mon, 17 Mar 2025 15:32:53 GMT
- Title: In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models
- Authors: David Ponce, Thierry Etchegoyhen,
- Abstract summary: We show that scenarios involving multilingual and smaller models result in downgraded ICL instruction following performance.<n>This study aims to further our understanding of current strengths and limitations of alternative methods for instruction following.
- Score: 3.069335774032178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction following is a critical ability for Large Language Models to perform downstream tasks. The standard approach to instruction alignment has relied on a specific phase of model tuning over curated instruction datasets, optionally complemented with an alignment step over human preferences. Recent work has shown the potential of in-context learning (ICL) alternatives to guide base models towards instruction following. This type of approach is particularly relevant to extend instruction following across languages and models of varying sizes adapted to different types of usage. In this work we compare ICL and instruction fine-tuning in English, French and Spanish, on Small Language Models, and provide experimental results on applying Direct Preference Optimisation (DPO) over base models. Our results show that scenarios involving multilingual and smaller models result in downgraded ICL instruction following performance, only partially mitigated by DPO alignment. This study aims to further our understanding of current strengths and limitations of alternative methods for instruction following.
Related papers
- Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning [74.56097953187994]
We present Trial-Error-Explain In-Context Learning (TICL), a tuning-free method that personalizes language models for text generation tasks.<n>TICL iteratively expands an in-context learning prompt via a trial-error-explain process, adding model-generated negative samples and explanations.<n>TICL achieves up to 91.5% against the previous state-of-the-art and outperforms competitive tuning-free baselines for personalized alignment tasks.
arXiv Detail & Related papers (2025-02-13T05:20:21Z) - LLMic: Romanian Foundation Language Model [76.09455151754062]
We present LLMic, a foundation language model designed specifically for the Romanian Language.<n>We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks.
arXiv Detail & Related papers (2025-01-13T22:14:45Z) - Smaller Language Models Are Better Instruction Evolvers [10.587052565101844]
Small language models (SLMs) can synthesize more effective instructions than large language models (LLMs)<n>We propose Instruction Complex-Aware IFD (IC-IFD) to evaluate the effectiveness of instruction data more accurately.
arXiv Detail & Related papers (2024-12-15T16:07:48Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? [42.37657013017192]
We show that instruction-tuning on parallel instead of monolingual corpora benefits cross-lingual instruction following capabilities by up to 9.9%.
We also conduct a human annotation study to understand the alignment between human-based and GPT-4-based evaluation within multilingual chat scenarios.
arXiv Detail & Related papers (2024-02-21T11:07:07Z) - Demystifying Instruction Mixing for Fine-tuning Large Language Models [29.69436955342966]
This study categorizes instructions into three primary types: NLP downstream tasks, coding, and general chat.
We find that certain instruction types are more advantageous for specific applications but can negatively impact other areas.
arXiv Detail & Related papers (2023-12-17T18:44:26Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Analyzing Bagging Methods for Language Models [0.5161531917413708]
We perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size.
Our ensembling methods are at best roughly equivalent to single LM baselines.
arXiv Detail & Related papers (2022-07-19T06:30:37Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.