Lines of Thought in Large Language Models
- URL: http://arxiv.org/abs/2410.01545v2
- Date: Mon, 28 Oct 2024 20:20:26 GMT
- Title: Lines of Thought in Large Language Models
- Authors: Raphaël Sarfati, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls,
- Abstract summary: Large Language Models achieve next-token prediction by transporting a vectorized piece of text across an accompanying embedding space.
We aim to characterize the statistical properties of ensembles of these 'lines of thought'
We find it remarkable that the vast complexity of such large models can be reduced to a much simpler form, and we reflect on implications.
- Score: 3.281128493853064
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models achieve next-token prediction by transporting a vectorized piece of text (prompt) across an accompanying embedding space under the action of successive transformer layers. The resulting high-dimensional trajectories realize different contextualization, or 'thinking', steps, and fully determine the output probability distribution. We aim to characterize the statistical properties of ensembles of these 'lines of thought.' We observe that independent trajectories cluster along a low-dimensional, non-Euclidean manifold, and that their path can be well approximated by a stochastic equation with few parameters extracted from data. We find it remarkable that the vast complexity of such large models can be reduced to a much simpler form, and we reflect on implications.
Related papers
- Model-free Estimation of Latent Structure via Multiscale Nonparametric Maximum Likelihood [13.175343048302697]
We propose a model-free approach for estimating such latent structures whenever they are present, without assuming they exist a priori.
As an application, we design a clustering algorithm based on the proposed procedure and demonstrate its effectiveness in capturing a wide range of latent structures.
arXiv Detail & Related papers (2024-10-29T17:11:33Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Mixed Gaussian Flow for Diverse Trajectory Prediction [78.00204650749453]
We propose a flow-based model to transform a mixed Gaussian prior into the future trajectory manifold.
The model shows a better capacity for generating diverse trajectory patterns.
We also demonstrate that it can generate diverse, controllable, and out-of-distribution trajectories.
arXiv Detail & Related papers (2024-02-19T15:48:55Z) - EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory
Forecasting [26.38308951284839]
We present EigenTrajectory ($mathbbET$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space.
EigenTrajectory can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks.
arXiv Detail & Related papers (2023-07-18T14:52:08Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Outlier Detection for Trajectories via Flow-embeddings [2.66418345185993]
We propose a method to detect outliers in empirically observed trajectories on a discretized manifold modeled by a simplicial complex.
Our approach is similar to spectral embeddings such as diffusion-maps and Laplacian eigenmaps, that construct embeddings from the eigenvectors of the graph Laplacian associated with low eigenvalues.
We show how this technique can single out trajectories that behave (topologically) different compared to typical trajectories, and illustrate the performance of our approach with both synthetic and empirical data.
arXiv Detail & Related papers (2021-11-25T19:58:48Z) - Unsupervised Sentence-embeddings by Manifold Approximation and
Projection [3.04585143845864]
We propose a novel technique to generate sentence-embeddings in an unsupervised fashion by projecting the sentences onto a fixed-dimensional manifold.
We test our approach, which we term EMAP or Embeddings by Manifold Approximation and Projection, on six publicly available text-classification datasets of varying size and complexity.
arXiv Detail & Related papers (2021-02-07T13:27:58Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents.
We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings.
We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.