Related papers: StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

URL: http://arxiv.org/abs/2012.00857v2
Date: Tue, 15 Dec 2020 20:55:53 GMT
Title: StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
Authors: Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville
Abstract summary: We introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time. We integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling.
Score: 45.96663013609177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There are two major classes of natural language grammars -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can induce dependency and constituency structure at the same time. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.

Related papers

Dependency Parsing with the Structuralized Prompt Template [14.547116901025506]
Dependency parsing is a fundamental task in natural language processing (NLP) We propose a novel dependency parsing method that relies solely on an encoder model with a text-to-text training approach. Our experimental results demonstrate that the proposed method achieves outstanding performance compared to traditional models.
arXiv Detail & Related papers (2025-02-24T07:25:10Z)
Linguistic Structure Induction from Language Models [1.8130068086063336]
This thesis focuses on producing constituency and dependency structures from Language Models (LMs) in an unsupervised setting. I present a detailed study on StructFormer (SF) which retrofits a transformer architecture with a encoder network to produce constituency and dependency structures. I present six experiments to analyze and address this field's challenges.
arXiv Detail & Related papers (2024-03-11T16:54:49Z)
Generic Dependency Modeling for Multi-Party Conversation [32.25605889407403]
We present an approach to encoding the dependencies in the form of relative dependency encoding (ReDE) We show how to implement it in Transformers by modifying the computation of self-attention.
arXiv Detail & Related papers (2023-02-21T13:58:19Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. Cases with control flow and dynamic structure require techniques from probabilistic programming. We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z)
Syntactic Inductive Biases for Deep Learning Methods [8.758273291015474]
We propose two families of inductive biases, one for constituency structure and another one for dependency structure. The constituency inductive bias encourages deep learning models to use different units (or neurons) to separately process long-term and short-term information. The dependency inductive bias encourages models to find the latent relations between entities in the input sequence.
arXiv Detail & Related papers (2022-06-08T11:18:39Z)
Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z)
Learning compositional structures for semantic graph parsing [81.41592892863979]
We show how AM dependency parsing can be trained directly on a neural latent-variable model. Our model picks up on several linguistic phenomena on its own and achieves comparable accuracy to supervised training.
arXiv Detail & Related papers (2021-06-08T14:20:07Z)
Syntactic Nuclei in Dependency Parsing -- A Multilingual Exploration [8.25332300240617]
We show how the concept of nucleus can be defined in the framework of Universal Dependencies. Experiments on 12 languages show that nucleus composition gives small but significant improvements in parsing accuracy.
arXiv Detail & Related papers (2021-01-28T12:22:30Z)
Second-Order Unsupervised Neural Dependency Parsing [52.331561380948564]
Most unsupervised dependencys are based on first-order probabilistic generative models that only consider local parent-child information. Inspired by second-order supervised dependency parsing, we proposed a second-order extension of unsupervised neural dependency models that incorporate grandparent-child or sibling information. Our joint model achieves a 10% improvement over the previous state-of-the-art on the full WSJ test set.
arXiv Detail & Related papers (2020-10-28T03:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.