Syntax-guided Neural Module Distillation to Probe Compositionality in
Sentence Embeddings
- URL: http://arxiv.org/abs/2301.08998v1
- Date: Sat, 21 Jan 2023 19:42:02 GMT
- Title: Syntax-guided Neural Module Distillation to Probe Compositionality in
Sentence Embeddings
- Authors: Rohan Pandey
- Abstract summary: We construct a neural module net based on its syntax parse and train it end-to-end to approximate the sentence's embedding.
We find differences in the distillability of various sentence embedding models that broadly correlate with their performance.
Preliminary evidence that much syntax-guided composition in sentence embedding models is linear.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Past work probing compositionality in sentence embedding models faces issues
determining the causal impact of implicit syntax representations. Given a
sentence, we construct a neural module net based on its syntax parse and train
it end-to-end to approximate the sentence's embedding generated by a
transformer model. The distillability of a transformer to a Syntactic NeurAl
Module Net (SynNaMoN) then captures whether syntax is a strong causal model of
its compositional ability. Furthermore, we address questions about the geometry
of semantic composition by specifying individual SynNaMoN modules' internal
architecture & linearity. We find differences in the distillability of various
sentence embedding models that broadly correlate with their performance, but
observe that distillability doesn't considerably vary by model size. We also
present preliminary evidence that much syntax-guided composition in sentence
embedding models is linear, and that non-linearities may serve primarily to
handle non-compositional phrases.
Related papers
- Comateformer: Combined Attention Transformer for Semantic Sentence Matching [11.746010399185437]
We propose a novel semantic sentence matching model named Combined Attention Network based on Transformer model (Comateformer)
In Comateformer model, we design a novel transformer-based quasi-attention mechanism with compositional properties.
Our proposed approach builds on the intuition of similarity and dissimilarity (negative affinity) when calculating dual affinity scores.
arXiv Detail & Related papers (2024-12-10T06:18:07Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model.
We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Learning Disentangled Representations for Natural Language Definitions [0.0]
We argue that recurrent syntactic and semantic regularities in textual data can be used to provide the models with both structural biases and generative factors.
We leverage the semantic structures present in a representative and semantically dense category of sentence types, definitional sentences, for training a Variational Autoencoder to learn disentangled representations.
arXiv Detail & Related papers (2022-09-22T14:31:55Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Syntactic Inductive Biases for Deep Learning Methods [8.758273291015474]
We propose two families of inductive biases, one for constituency structure and another one for dependency structure.
The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information.
The dependency inductive bias encourages models to find the latent relations between entities in the input sequence.
arXiv Detail & Related papers (2022-06-08T11:18:39Z) - Transformer Grammars: Augmenting Transformer Language Models with
Syntactic Inductive Biases at Scale [31.293175512404172]
We introduce Transformer Grammars -- a class of Transformer language models that combine expressive power, scalability, and strong performance of Transformers.
We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics.
arXiv Detail & Related papers (2022-03-01T17:22:31Z) - Compositionality as Lexical Symmetry [42.37422271002712]
In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets.
We present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models.
We describe a procedure called LEXSYM that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models.
arXiv Detail & Related papers (2022-01-30T21:44:46Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - A Framework for Measuring Compositional Inductive Bias [0.30458514384586405]
We present a framework for measuring the compositional inductive bias of models in emergent communications.
We devise corrupted compositional grammars that probe for limitations in the compositional inductive bias of frequently used models.
We propose a hierarchical model which might show an inductive bias towards relocatable atomic groups of tokens, thus potentially encouraging the emergence of words.
arXiv Detail & Related papers (2021-03-06T19:25:37Z) - Multi-Step Inference for Reasoning Over Paragraphs [95.91527524872832]
Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives.
We present a compositional model reminiscent of neural module networks that can perform chained logical reasoning.
arXiv Detail & Related papers (2020-04-06T21:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.