Towards Measuring Representational Similarity of Large Language Models
- URL: http://arxiv.org/abs/2312.02730v1
- Date: Tue, 5 Dec 2023 12:48:04 GMT
- Title: Towards Measuring Representational Similarity of Large Language Models
- Authors: Max Klabunde, Mehdi Ben Amor, Michael Granitzer, Florian Lemmerich
- Abstract summary: We measure the similarity of representations of a set of large language models with 7B parameters.
Our results suggest that some LLMs are substantially different from others.
We identify challenges of using representational similarity measures that suggest the need of careful study of similarity scores to avoid false conclusions.
- Score: 1.7228514699394508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the similarity of the numerous released large language models
(LLMs) has many uses, e.g., simplifying model selection, detecting illegal
model reuse, and advancing our understanding of what makes LLMs perform well.
In this work, we measure the similarity of representations of a set of LLMs
with 7B parameters. Our results suggest that some LLMs are substantially
different from others. We identify challenges of using representational
similarity measures that suggest the need of careful study of similarity scores
to avoid false conclusions.
Related papers
- Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.
In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z) - What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.
We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z) - A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension [16.671316494925346]
This study investigates the effects of supervised fine-tuning and in-context learning on the hidden representations of Large Language Models (LLMs)
We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL.
We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT.
arXiv Detail & Related papers (2024-12-09T06:37:35Z) - Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs)
Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z) - In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting [33.89176174108559]
In-context learning of large language models (LLMs) makes predictions only based on instructions augmented with a few examples.
Existing example selection methods for ICL utilize sparse or dense retrievers and derive effective performance.
We propose our policy-based reinforcement learning framework for example selection (RLS), which consists of a language model (LM) selector and an LLM generator.
arXiv Detail & Related papers (2024-08-23T12:32:12Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism [39.392450788666814]
Current evaluations of large language models (LLMs) often overlook non-determinism.
greedy decoding generally outperforms sampling methods for most evaluated tasks.
Smaller LLMs can match or surpass larger models such as GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-15T06:12:17Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Scaling Sentence Embeddings with Large Language Models [43.19994568210206]
In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance.
Our approach involves adapting the previous prompt-based representation method for autoregressive models.
By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity tasks.
arXiv Detail & Related papers (2023-07-31T13:26:03Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - ThinkSum: Probabilistic reasoning over sets using large language models [18.123895485602244]
We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner.
We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks.
arXiv Detail & Related papers (2022-10-04T00:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.