Meaning Representations from Trajectories in Autoregressive Models
- URL: http://arxiv.org/abs/2310.18348v3
- Date: Wed, 29 Nov 2023 05:32:24 GMT
- Title: Meaning Representations from Trajectories in Autoregressive Models
- Authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera,
Luca Zancato, Stefano Soatto
- Abstract summary: We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model.
We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
- Score: 106.63181745054571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to extract meaning representations from autoregressive language
models by considering the distribution of all possible trajectories extending
an input text. This strategy is prompt-free, does not require fine-tuning, and
is applicable to any pre-trained autoregressive model. Moreover, unlike
vector-based representations, distribution-based representations can also model
asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym
relations) by using algebraic operations between likelihood functions. These
ideas are grounded in distributional perspectives on semantics and are
connected to standard constructions in automata theory, but to our knowledge
they have not been applied to modern language models. We empirically show that
the representations obtained from large models align well with human
annotations, outperform other zero-shot and prompt-free methods on semantic
similarity tasks, and can be used to solve more complex entailment and
containment tasks that standard embeddings cannot handle. Finally, we extend
our method to represent data from different modalities (e.g., image and text)
using multimodal autoregressive models. Our code is available at:
https://github.com/tianyu139/meaning-as-trajectories
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding [0.8437187555622164]
The Rational Speech Act (RSA) model provides a flexible framework to model pragmatic reasoning in computational terms.
Here, we introduce a new RSA framework for metaphor understanding that addresses limitations by providing an explicit formula.
The model was tested against 24 metaphors, not limited to the conventional $textitJohn-is-a-shark$ type.
arXiv Detail & Related papers (2024-04-03T18:09:33Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Discovering interpretable elastoplasticity models via the neural
polynomial method enabled symbolic regressions [0.0]
Conventional neural network elastoplasticity models are often perceived as lacking interpretability.
This paper introduces a two-step machine learning approach that returns mathematical models interpretable by human experts.
arXiv Detail & Related papers (2023-07-24T22:22:32Z) - Combining Discrete Choice Models and Neural Networks through Embeddings:
Formulation, Interpretability and Performance [10.57079240576682]
This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs)
In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables.
Our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters.
arXiv Detail & Related papers (2021-09-24T15:55:31Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Autoencoding Pixies: Amortised Variational Inference with Graph
Convolutions for Functional Distributional Semantics [12.640283469603355]
Pixie Autoencoder augments the generative model of Functional Distributional Semantics with a graph-convolutional neural network to perform amortised variational inference.
arXiv Detail & Related papers (2020-05-06T17:46:40Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.