Related papers: Compositional Generalization in Dependency Parsing

Compositional Generalization in Dependency Parsing

URL: http://arxiv.org/abs/2110.06843v1
Date: Wed, 13 Oct 2021 16:32:24 GMT
Title: Compositional Generalization in Dependency Parsing
Authors: Emily Goodwin, Siva Reddy, Timothy J. O'Donnell, Dzmitry Bahdanau
Abstract summary: Dependency, however, lacks a compositional parsing benchmark. We find that increasing compound divergence degrades dependency performance, although not as dramatically as semantic parsing performance. We identify a number of syntactic structures that drive the dependency's lower performance on the most challenging splits.
Score: 15.953482168182003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compositionality, or the ability to combine familiar units like words into novel phrases and sentences, has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behavior of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser's lower performance on the most challenging splits.

Related papers

CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors. Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation. We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z)
A Systematic Comparison of Syntactic Representations of Dependency Parsing [5.844015313757265]
We pro-pose to convert some specific syntactic constructions observed in the universal dependency treebanks into a so-called more standard representation. We show that the standard'' constructions do not lead systematically to better parsing performance and that the scores vary considerably according to the languages.
arXiv Detail & Related papers (2025-03-10T10:13:55Z)
Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation [9.075353955444518]
Dependency parsing is an essential task in NLP, and the quality of dependencys is crucial for many downstream tasks. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. We compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
arXiv Detail & Related papers (2024-03-28T07:27:10Z)
Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives. We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z)
CIParsing: Unifying Causality Properties into Multiple Human Parsing [82.32620538918812]
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts. We present a causality inspired parsing paradigm termed CIParsing, which follows fundamental causal principles involving two causal properties for human parsing. The CIParsing is designed in a plug-and-play fashion and can be integrated into any existing MHP models.
arXiv Detail & Related papers (2023-08-23T15:56:26Z)
Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags. Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other. We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z)
Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization [2.9950872478176627]
We study how the presence of a prior sentence can resolve relative clause ambiguities. Results show that a neurosymbolic, based on proof nets, is more open to data bias correction than an approach based on universal dependencies.
arXiv Detail & Related papers (2023-05-24T09:04:18Z)
Syntactic Substitutability as Unsupervised Dependency Syntax [31.488677474152794]
We model a more general property implicit in the definition of dependency relations, syntactic substitutability. This property captures the fact that words at either end of a dependency can be substituted with words from the same category. We show that increasing the number of substitutions used improves parsing accuracy on natural data.
arXiv Detail & Related papers (2022-11-29T09:01:37Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
A Conditional Splitting Framework for Efficient Constituency Parsing [14.548146390081778]
We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions. Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding. For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
arXiv Detail & Related papers (2021-06-30T00:36:34Z)
Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context. We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z)
Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input. On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z)
A Survey of Syntactic-Semantic Parsing Based on Constituent and Dependency Structures [14.714725860010724]
We focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing. This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency parsing with rich semantics.
arXiv Detail & Related papers (2020-06-19T10:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.