Syntactic Inductive Biases for Deep Learning Methods
- URL: http://arxiv.org/abs/2206.04806v1
- Date: Wed, 8 Jun 2022 11:18:39 GMT
- Title: Syntactic Inductive Biases for Deep Learning Methods
- Authors: Yikang Shen
- Abstract summary: We propose two families of inductive biases, one for constituency structure and another one for dependency structure.
The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information.
The dependency inductive bias encourages models to find the latent relations between entities in the input sequence.
- Score: 8.758273291015474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this thesis, we try to build a connection between the two schools by
introducing syntactic inductive biases for deep learning models. We propose two
families of inductive biases, one for constituency structure and another one
for dependency structure. The constituency inductive bias encourages deep
learning models to use different units (or neurons) to separately process
long-term and short-term information. This separation provides a way for deep
learning models to build the latent hierarchical representations from
sequential inputs, that a higher-level representation is composed of and can be
decomposed into a series of lower-level representations. For example, without
knowing the ground-truth structure, our proposed model learns to process
logical expression through composing representations of variables and operators
into representations of expressions according to its syntactic structure. On
the other hand, the dependency inductive bias encourages models to find the
latent relations between entities in the input sequence. For natural language,
the latent relations are usually modeled as a directed dependency graph, where
a word has exactly one parent node and zero or several children nodes. After
applying this constraint to a Transformer-like model, we find the model is
capable of inducing directed graphs that are close to human expert annotations,
and it also outperforms the standard transformer model on different tasks. We
believe that these experimental results demonstrate an interesting alternative
for the future development of deep learning models.
Related papers
- Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model.
We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Latent Traversals in Generative Models as Potential Flows [113.4232528843775]
We propose to model latent structures with a learned dynamic potential landscape.
Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations.
Our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T15:53:45Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning Disentangled Representations for Natural Language Definitions [0.0]
We argue that recurrent syntactic and semantic regularities in textual data can be used to provide the models with both structural biases and generative factors.
We leverage the semantic structures present in a representative and semantically dense category of sentence types, definitional sentences, for training a Variational Autoencoder to learn disentangled representations.
arXiv Detail & Related papers (2022-09-22T14:31:55Z) - Amortised Inference in Structured Generative Models with Explaining Away [16.92791301062903]
We extend the output of amortised variational inference to incorporate structured factors over multiple variables.
We show that appropriately parameterised factors can be combined efficiently with variational message passing in elaborate graphical structures.
We then fit the structured model to high-dimensional neural spiking time-series from the hippocampus of freely moving rodents.
arXiv Detail & Related papers (2022-09-12T12:52:15Z) - A Generative Approach for Mitigating Structural Biases in Natural
Language Inference [24.44419010439227]
In this work, we reformulate the NLI task as a generative task, where a model is conditioned on the biased subset of the input and the label.
We show that this approach is highly robust to large amounts of bias.
We find that generative models are difficult to train and they generally perform worse than discriminative baselines.
arXiv Detail & Related papers (2021-08-31T17:59:45Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - StructFormer: Joint Unsupervised Induction of Dependency and
Constituency Structure from Masked Language Modeling [45.96663013609177]
We introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time.
We integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism.
Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling.
arXiv Detail & Related papers (2020-12-01T21:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.