Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic
- URL: http://arxiv.org/abs/2412.07386v1
- Date: Tue, 10 Dec 2024 10:32:01 GMT
- Title: Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic
- Authors: Alan Sun, Ethan Sun, Warren Shepard,
- Abstract summary: Large language models can zero-shot some tasks but not others.<n>Algorithm instability may be a contributing factor to language models' poor zero-shot performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot capabilities of large language models make them powerful tools for solving a range of tasks without explicit training. It remains unclear, however, how these models achieve such performance, or why they can zero-shot some tasks but not others. In this paper, we shed some light on this phenomenon by defining and investigating algorithmic stability in language models -- changes in problem-solving strategy employed by the model as a result of changes in task specification. We focus on a task where algorithmic stability is needed for generalization: two-operand arithmetic. Surprisingly, we find that Gemma-2-2b employs substantially different computational models on closely related subtasks, i.e. four-digit versus eight-digit addition. Our findings suggest that algorithmic instability may be a contributing factor to language models' poor zero-shot performance across certain logical reasoning tasks, as they struggle to abstract different problem-solving strategies and smoothly transition between them.
Related papers
- When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors.
This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks.
We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z) - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.
This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.
We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - Task Arithmetic Through The Lens Of One-Shot Federated Learning [3.8230727103887943]
Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model.<n>We show that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning.<n>We adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic.
arXiv Detail & Related papers (2024-11-27T18:53:41Z) - A resource-efficient model for deep kernel learning [0.0]
There are various approaches for accelerate learning computations with minimal loss of accuracy.
We describe a model-level decomposition approach that combines both the decomposition of the operators and the decomposition of the network.
We perform a feasibility analysis on the resulting algorithm, both in terms of its accuracy and scalability.
arXiv Detail & Related papers (2024-10-13T17:11:42Z) - Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning [44.910762928636565]
We present a new method for large language models to solve compositional tasks.
Our method, Re-Tuning, tunes models to break down a problem into subproblems, solve those subproblems, and combine the results.
arXiv Detail & Related papers (2024-07-05T18:02:28Z) - Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks.
Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations.
Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z) - Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning
Performance of GPT-2 [6.037255578530709]
We show that dynamic problem elaboration significantly improves the zero-shot performance of GPT-2 in a deductive reasoning and natural language inference task.
In particular, elaborations that are most faithful to the original problem description may boost accuracy by up to 24%.
arXiv Detail & Related papers (2021-03-24T07:33:25Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.