Critical Phase Transition in Large Language Models
- URL: http://arxiv.org/abs/2406.05335v2
- Date: Tue, 22 Oct 2024 09:32:17 GMT
- Title: Critical Phase Transition in Large Language Models
- Authors: Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima,
- Abstract summary: Large Language Models (LLMs) have demonstrated impressive performance.
To understand their behaviors, we need to consider the fact that LLMs sometimes show qualitative changes.
We suggest that a phase transition occurs in LLMs when varying the temperature parameter.
- Score: 0.0
- License:
- Abstract: Large Language Models (LLMs) have demonstrated impressive performance. To understand their behaviors, we need to consider the fact that LLMs sometimes show qualitative changes. The natural world also presents such changes called phase transitions, which are defined by singular, divergent statistical quantities. Therefore, an intriguing question is whether qualitative changes in LLMs are phase transitions. In this work, we have conducted extensive analysis on texts generated by LLMs and suggested that a phase transition occurs in LLMs when varying the temperature parameter. Specifically, statistical quantities have divergent properties just at the point between the low-temperature regime, where LLMs generate sentences with clear repetitive structures, and the high-temperature regime, where generated sentences are often incomprehensible. In addition, critical behaviors near the phase transition point, such as a power-law decay of correlation and slow convergence toward the stationary state, are similar to those in natural languages. Our results suggest a meaningful analogy between LLMs and natural phenomena.
Related papers
- Phase Transitions in Large Language Models and the $O(N)$ Model [0.0]
We reformulated the Transformer architecture as an $O(N)$ model to investigate phase transitions in large language models.
Our study reveals two distinct phase transitions corresponding to the temperature used in text generation.
As an application, the energy of the $O(N)$ model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.
arXiv Detail & Related papers (2025-01-27T17:36:06Z) - A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension [16.671316494925346]
This study investigates the effects of supervised fine-tuning and in-context learning on the hidden representations of Large Language Models (LLMs)
We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL.
We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT.
arXiv Detail & Related papers (2024-12-09T06:37:35Z) - First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models [1.4061979259370274]
We numerically demonstrate an unambiguous phase transition in the framework of a natural language model.
We identify the phase transition as a variant of the Berezinskii-Kosterlitz-Thouless transition.
arXiv Detail & Related papers (2024-12-02T07:32:32Z) - Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model [14.92282077647913]
Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks.
We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM is sequentially fine-tuned on a multilingual dataset.
We observe that the similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability.
arXiv Detail & Related papers (2024-10-21T13:39:03Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset [50.36095192314595]
Large Language Models (LLMs) function as conscious agents with generalizable reasoning capabilities.
This ability remains underexplored due to the complexity of modeling infinite possible changes in an event.
We introduce the first-ever benchmark, MARS, comprising three tasks corresponding to each step.
arXiv Detail & Related papers (2024-06-04T08:35:04Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information.
This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Measurement-Driven Phase Transition within a Volume-Law Entangled Phase [0.0]
We study a transition between two kinds of volume-law entangled phases in non-local but few-body unitary dynamics.
In one phase, a finite fraction belongs to a fully-entangled state, while in the second phase, the steady-state is a product state over extensively many, finite subsystems.
arXiv Detail & Related papers (2020-05-06T18:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.