What's Hard in English RST Parsing? Predictive Models for Error Analysis
- URL: http://arxiv.org/abs/2309.04940v1
- Date: Sun, 10 Sep 2023 06:10:03 GMT
- Title: What's Hard in English RST Parsing? Predictive Models for Error Analysis
- Authors: Yang Janet Liu and Tatsuya Aoyama and Amir Zeldes
- Abstract summary: In this paper, we examine and model some of the factors associated with parsing difficulties in Rhetorical Structure Theory.
Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge.
Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up and 76.6% for the top-down.
- Score: 16.927386793787463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advances in Natural Language Processing (NLP), hierarchical
discourse parsing in the framework of Rhetorical Structure Theory remains
challenging, and our understanding of the reasons for this are as yet limited.
In this paper, we examine and model some of the factors associated with parsing
difficulties in previous work: the existence of implicit discourse relations,
challenges in identifying long-distance relations, out-of-vocabulary items, and
more. In order to assess the relative importance of these variables, we also
release two annotated English test-sets with explicit correct and distracting
discourse markers associated with gold standard RST relations. Our results show
that as in shallow discourse parsing, the explicit/implicit distinction plays a
role, but that long-distance dependencies are the main challenge, while lack of
lexical overlap is less of a problem, at least for in-domain parsing. Our final
model is able to predict where errors will occur with an accuracy of 76.3% for
the bottom-up parser and 76.6% for the top-down parser.
Related papers
- Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective [0.0]
We use dependency parsing to analyze news articles in Urdu.
We achieve a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%.
arXiv Detail & Related papers (2024-06-13T19:30:32Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Structural Ambiguity and its Disambiguation in Language Model Based
Parsers: the Case of Dutch Clause Relativization [2.9950872478176627]
We study how the presence of a prior sentence can resolve relative clause ambiguities.
Results show that a neurosymbolic, based on proof nets, is more open to data bias correction than an approach based on universal dependencies.
arXiv Detail & Related papers (2023-05-24T09:04:18Z) - Topic-driven Distant Supervision Framework for Macro-level Discourse
Parsing [72.14449502499535]
The task of analyzing the internal rhetorical structure of texts is a challenging problem in natural language processing.
Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle.
Recent studies have attempted to overcome this limitation by using distant supervision.
arXiv Detail & Related papers (2023-05-23T07:13:51Z) - ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
Causal, and Discourse Relations [52.26802326949116]
We quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations.
ChatGPT exhibits exceptional proficiency in detecting and reasoning about causal relations.
It is capable of identifying the majority of discourse relations with existing explicit discourse connectives, but the implicit discourse relation remains a formidable challenge.
arXiv Detail & Related papers (2023-04-28T13:14:36Z) - Let's be explicit about that: Distant supervision for implicit discourse
relation classification via connective prediction [0.0]
In implicit discourse relation classification, we want to predict the relation between adjacent sentences in the absence of any overt discourse connectives.
We sidestep the lack of data through explicitation of implicit relations to reduce the task to two sub-problems: language modeling and explicit discourse relation classification.
Our experimental results show that this method can even marginally outperform the state-of-the-art, in spite of being much simpler than alternative models of comparable performance.
arXiv Detail & Related papers (2021-06-06T17:57:32Z) - Prosodic segmentation for parsing spoken dialogue [29.68201160277817]
Parsing spoken dialogue poses unique difficulties, including disfluencies and unmarked boundaries.
Previous work has shown that prosody can help with parsing disfluent speech.
We show that prosody can effectively replace gold standard SU boundaries.
arXiv Detail & Related papers (2021-05-26T16:30:16Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z) - A Survey of Unsupervised Dependency Parsing [62.16714720135358]
Unsupervised dependency parsing aims to learn a dependency from sentences that have no annotation of their correct parse trees.
Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data.
arXiv Detail & Related papers (2020-10-04T10:51:22Z) - A Survey of Syntactic-Semantic Parsing Based on Constituent and
Dependency Structures [14.714725860010724]
We focus on two of the most popular formalizations of parsing: constituent parsing and dependency parsing.
This article briefly reviews the representative models of constituent parsing and dependency parsing, and also dependency parsing with rich semantics.
arXiv Detail & Related papers (2020-06-19T10:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.