Related papers: Meaning Representations from Trajectories in Autoregressive Models

Meaning Representations from Trajectories in Autoregressive Models

URL: http://arxiv.org/abs/2310.18348v3
Date: Wed, 29 Nov 2023 05:32:24 GMT
Title: Meaning Representations from Trajectories in Autoregressive Models
Authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
Abstract summary: We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
Score: 106.63181745054571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models. Our code is available at: https://github.com/tianyu139/meaning-as-trajectories

Related papers

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models [6.390475802910619]
We show that representations learned across models trained on the same data can be expressed as linear combinations of a emphuniversal set of basis features.<n>These basis features underlie the learning task itself and remain consistent across models, regardless of scale.
arXiv Detail & Related papers (2025-05-31T17:45:18Z)
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think [81.38614558541772]
We introduce the CoT Encyclopedia, a framework for analyzing and steering model reasoning.<n>Our method automatically extracts diverse reasoning criteria from model-generated CoTs.<n>We show that this framework produces more interpretable and comprehensive analyses than existing methods.
arXiv Detail & Related papers (2025-05-15T11:31:02Z)
Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning [3.519547280344187]
We study a class of solvable contrastive self-supervised algorithms which we term quadratic word embedding models. Our solutions reveal that these models learn linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. We use our dynamical theory to predict how and when models acquire the ability to complete analogies.
arXiv Detail & Related papers (2025-02-14T02:16:48Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding [0.8437187555622164]
The Rational Speech Act (RSA) model provides a flexible framework to model pragmatic reasoning in computational terms. Here, we introduce a new RSA framework for metaphor understanding that addresses limitations by providing an explicit formula. The model was tested against 24 metaphors, not limited to the conventional $textitJohn-is-a-shark$ type.
arXiv Detail & Related papers (2024-04-03T18:09:33Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Discovering interpretable elastoplasticity models via the neural polynomial method enabled symbolic regressions [0.0]
Conventional neural network elastoplasticity models are often perceived as lacking interpretability. This paper introduces a two-step machine learning approach that returns mathematical models interpretable by human experts.
arXiv Detail & Related papers (2023-07-24T22:22:32Z)
Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance [10.57079240576682]
This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs) In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables. Our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters.
arXiv Detail & Related papers (2021-09-24T15:55:31Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics [12.640283469603355]
Pixie Autoencoder augments the generative model of Functional Distributional Semantics with a graph-convolutional neural network to perform amortised variational inference.
arXiv Detail & Related papers (2020-05-06T17:46:40Z)
Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types. We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.