Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models
- URL: http://arxiv.org/abs/2504.20951v1
- Date: Tue, 29 Apr 2025 17:21:20 GMT
- Title: Information Gravity: A Field-Theoretic Model for Token Selection in Large Language Models
- Authors: Maryna Vyshnyvetska,
- Abstract summary: We propose a theoretical model called "information gravity" to describe the text generation process in large language models (LLMs)<n>A query is viewed as an object with "information mass" that curves the semantic space of the model, creating gravitational potential wells that "attract" tokens during generation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a theoretical model called "information gravity" to describe the text generation process in large language models (LLMs). The model uses physical apparatus from field theory and spacetime geometry to formalize the interaction between user queries and the probability distribution of generated tokens. A query is viewed as an object with "information mass" that curves the semantic space of the model, creating gravitational potential wells that "attract" tokens during generation. This model offers a mechanism to explain several observed phenomena in LLM behavior, including hallucinations (emerging from low-density semantic voids), sensitivity to query formulation (due to semantic field curvature changes), and the influence of sampling temperature on output diversity.
Related papers
- Universal New Physics Latent Space [0.0]
We develop a machine learning method for mapping data originating from both Standard Model processes and various theories beyond the Standard Model into a unified representation (latent) space.<n>We apply our method to three examples of new physics at the LHC of increasing complexity, showing that models can be clustered according to their LHC phenomenology.
arXiv Detail & Related papers (2024-07-29T18:00:00Z) - Diffusion-HMC: Parameter Inference with Diffusion-model-driven Hamiltonian Monte Carlo [2.048226951354646]
This work uses a single diffusion generative model to address the interlinked objectives of generating predictions for observed astrophysical fields and constraining physical models from observations.<n>We leverage the approximate likelihood of the diffusion generative model to derive tight constraints on cosmology by using the Hamiltonian Monte Carlo method to sample the posterior on cosmological parameters for a given test image.
arXiv Detail & Related papers (2024-05-08T17:59:03Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Materials Informatics Transformer: A Language Model for Interpretable
Materials Properties Prediction [6.349503549199403]
We introduce our model Materials Informatics Transformer (MatInFormer) for material property prediction.
Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information.
arXiv Detail & Related papers (2023-08-30T18:34:55Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Disentangling Shape and Pose for Object-Centric Deep Active Inference
Models [4.298360054690217]
We consider the problem of 3D object representation, and focus on different instances of the ShapeNet dataset.
We propose a model that factorizes object shape, pose and category, while still learning a representation for each factor using a deep neural network.
arXiv Detail & Related papers (2022-09-16T12:53:49Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Learning to discover: expressive Gaussian mixture models for
multi-dimensional simulation and parameter inference in the physical sciences [0.0]
We show that density models describing multiple observables may be created using an auto-regressive Gaussian mixture model.
The model is designed to capture how observable spectra are deformed by hypothesis variations.
It may be used as a statistical model for scientific discovery in interpreting experimental observations.
arXiv Detail & Related papers (2021-08-25T21:27:29Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.