Dependency Induction Through the Lens of Visual Perception
- URL: http://arxiv.org/abs/2109.09790v1
- Date: Mon, 20 Sep 2021 18:40:37 GMT
- Title: Dependency Induction Through the Lens of Visual Perception
- Authors: Ruisi Su, Shruti Rijhwani, Hao Zhu, Junxian He, Xinyu Wang, Yonatan
Bisk, Graham Neubig
- Abstract summary: We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
- Score: 81.91502968815746
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Most previous work on grammar induction focuses on learning phrasal or
dependency structure purely from text. However, because the signal provided by
text alone is limited, recently introduced visually grounded syntax models make
use of multimodal information leading to improved performance in constituency
grammar induction. However, as compared to dependency grammars, constituency
grammars do not provide a straightforward way to incorporate visual information
without enforcing language-specific heuristics. In this paper, we propose an
unsupervised grammar induction model that leverages word concreteness and a
structural vision-based heuristic to jointly learn constituency-structure and
dependency-structure grammars. Our experiments find that concreteness is a
strong indicator for learning dependency grammars, improving the direct
attachment score (DAS) by over 50\% as compared to state-of-the-art models
trained on pure text. Next, we propose an extension of our model that leverages
both word concreteness and visual semantic role labels in constituency and
dependency parsing. Our experiments show that the proposed extension
outperforms the current state-of-the-art visually grounded models in
constituency parsing even with a smaller grammar size.
Related papers
- Leveraging Grammar Induction for Language Understanding and Generation [7.459693992079273]
We introduce an unsupervised grammar induction method for language understanding and generation.
We construct a grammar to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks.
We evaluate and apply our method to multiple machine translation tasks natural language understanding tasks.
arXiv Detail & Related papers (2024-10-07T09:57:59Z) - Grammar Induction from Visual, Speech and Text [91.98797120799227]
This work introduces a novel visual-audio-text grammar induction task (textbfVAT-GI)
Inspired by the fact that language grammar exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction.
We propose a visual-audio-text inside-outside autoencoder (textbfVaTiora) framework, which leverages rich modal-specific and complementary features for effective grammar parsing.
arXiv Detail & Related papers (2024-10-01T02:24:18Z) - Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels.
Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z) - Imposing Relation Structure in Language-Model Embeddings Using
Contrastive Learning [30.00047118880045]
We propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure.
The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task.
arXiv Detail & Related papers (2021-09-02T10:58:27Z) - VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework.
We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z) - StructFormer: Joint Unsupervised Induction of Dependency and
Constituency Structure from Masked Language Modeling [45.96663013609177]
We introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time.
We integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism.
Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling.
arXiv Detail & Related papers (2020-12-01T21:54:51Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.