Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation
- URL: http://arxiv.org/abs/2311.15211v1
- Date: Sun, 26 Nov 2023 06:56:02 GMT
- Title: Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation
- Authors: Haoyi Wu, Kewei Tu
- Abstract summary: We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
- Score: 52.270712965271656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Syntactic structures used to play a vital role in natural language processing
(NLP), but since the deep learning revolution, NLP has been gradually dominated
by neural models that do not consider syntactic structures in their design. One
vastly successful class of neural models is transformers. When used as an
encoder, a transformer produces contextual representation of words in the input
sentence. In this work, we propose a new model of contextual word
representation, not from a neural perspective, but from a purely syntactic and
probabilistic perspective. Specifically, we design a conditional random field
that models discrete latent representations of all words in a sentence as well
as dependency arcs between them; and we use mean field variational inference
for approximate inference. Strikingly, we find that the computation graph of
our model resembles transformers, with correspondences between dependencies and
self-attention and between distributions over latent representations and
contextual embeddings of words. Experiments show that our model performs
competitively to transformers on small to medium sized datasets. We hope that
our work could help bridge the gap between traditional syntactic and
probabilistic approaches and cutting-edge neural approaches to NLP, and inspire
more linguistically-principled neural approaches in the future.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Learning Semantic Textual Similarity via Topic-informed Discrete Latent
Variables [17.57873577962635]
We develop a topic-informed discrete latent variable model for semantic textual similarity.
Our model learns a shared latent space for sentence-pair representation via vector quantization.
We show that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
arXiv Detail & Related papers (2022-11-07T15:09:58Z) - Syntactic Inductive Biases for Deep Learning Methods [8.758273291015474]
We propose two families of inductive biases, one for constituency structure and another one for dependency structure.
The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information.
The dependency inductive bias encourages models to find the latent relations between entities in the input sequence.
arXiv Detail & Related papers (2022-06-08T11:18:39Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Implicit Representations of Meaning in Neural Language Models [31.71898809435222]
We identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse.
Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state.
arXiv Detail & Related papers (2021-06-01T19:23:20Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.