The representation landscape of few-shot learning and fine-tuning in large language models
- URL: http://arxiv.org/abs/2409.03662v2
- Date: Sat, 7 Sep 2024 12:47:54 GMT
- Title: The representation landscape of few-shot learning and fine-tuning in large language models
- Authors: Diego Doimo, Alessandro Serra, Alessio Ansuini, Alberto Cazzaniga,
- Abstract summary: In-context learning (ICL) and supervised fine-tuning (SFT) are two common strategies for improving the performance of modern large language models (LLMs)
We analyze the probability landscape of their hidden representations in the two cases.
We find that ICL and SFT create very different internal structures, in both cases undergoing a sharp transition in the middle of the network.
- Score: 43.76048699313088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) and supervised fine-tuning (SFT) are two common strategies for improving the performance of modern large language models (LLMs) on specific tasks. Despite their different natures, these strategies often lead to comparable performance gains. However, little is known about whether they induce similar representations inside LLMs. We approach this problem by analyzing the probability landscape of their hidden representations in the two cases. More specifically, we compare how LLMs solve the same question-answering task, finding that ICL and SFT create very different internal structures, in both cases undergoing a sharp transition in the middle of the network. In the first half of the network, ICL shapes interpretable representations hierarchically organized according to their semantic content. In contrast, the probability landscape obtained with SFT is fuzzier and semantically mixed. In the second half of the model, the fine-tuned representations develop probability modes that better encode the identity of answers, while the landscape of ICL representations is characterized by less defined peaks. Our approach reveals the diverse computational strategies developed inside LLMs to solve the same task across different conditions, allowing us to make a step towards designing optimal methods to extract information from language models.
Related papers
- Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models [58.936893810674896]
Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems.
We introduce a multimodal large language model framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS)
We propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images.
arXiv Detail & Related papers (2025-01-03T09:25:04Z) - Does Representation Matter? Exploring Intermediate Layers in Large Language Models [22.704926222438456]
We investigate the quality of intermediate representations in large language models (LLMs)
We find that intermediate layers often yield more informative representations for downstream tasks than the final layers.
Our results illuminate the internal mechanics of LLMs and guide strategies for architectural optimization and training.
arXiv Detail & Related papers (2024-12-12T18:48:51Z) - A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension [16.671316494925346]
This study investigates the effects of supervised fine-tuning and in-context learning on the hidden representations of Large Language Models (LLMs)
We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL.
We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT.
arXiv Detail & Related papers (2024-12-09T06:37:35Z) - Discriminative Fine-tuning of LVLMs [67.14293827774827]
Contrastively-trained Vision-Language Models (VLMs) like CLIP have become the de facto approach for discriminative vision-language representation learning.
We propose to combine "the best of both worlds": a new training approach for discriminative fine-tuning of LVLMs.
arXiv Detail & Related papers (2024-12-05T17:54:27Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability.
The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts.
As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z) - Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration [39.35476224845088]
Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling.
We propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step.
arXiv Detail & Related papers (2024-04-19T08:52:22Z) - Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer [32.657218195756414]
Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc.
We introduce E$2$STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy.
E$2$STR exhibits remarkable training-free adaptation in various scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks.
arXiv Detail & Related papers (2023-11-22T02:46:57Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - A Dual-branch Self-supervised Representation Learning Framework for
Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images.
These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features.
We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.