RST Parsing from Scratch
- URL: http://arxiv.org/abs/2105.10861v1
- Date: Sun, 23 May 2021 06:19:38 GMT
- Title: RST Parsing from Scratch
- Authors: Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, Xiaoli Li
- Abstract summary: We introduce a novel end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.
Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite.
Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees.
- Score: 14.548146390081778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel top-down end-to-end formulation of document-level
discourse parsing in the Rhetorical Structure Theory (RST) framework. In this
formulation, we consider discourse parsing as a sequence of splitting decisions
at token boundaries and use a seq2seq network to model the splitting decisions.
Our framework facilitates discourse parsing from scratch without requiring
discourse segmentation as a prerequisite; rather, it yields segmentation as
part of the parsing process. Our unified parsing model adopts a beam search to
decode the best tree structure by searching through a space of high-scoring
trees. With extensive experiments on the standard English RST discourse
treebank, we demonstrate that our parser outperforms existing methods by a good
margin in both end-to-end parsing and parsing with gold segmentation. More
importantly, it does so without using any handcrafted features, making it
faster and easily adaptable to new languages and domains.
Related papers
- Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech [8.550564152063522]
We report on a set of experiments aiming at assessing the performance of two parsing paradigms on speech parsing.
We perform this evaluation on a large treebank of spoken French, featuring realistic spontaneous conversations.
Our findings show that (i) the graph based approach obtains better results across the board (ii) parsing directly from speech outperforms a pipeline approach, despite having 30% fewer parameters.
arXiv Detail & Related papers (2024-06-18T13:46:10Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - RST-style Discourse Parsing Guided by Document-level Content Structures [27.28989421841165]
Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures.
We propose a novel pipeline for RST-DP that incorporates structure-aware news content sentence representations.
arXiv Detail & Related papers (2023-09-08T05:50:27Z) - Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation.
We propose a principled method that improves upon previous work from two perspectives: encoding and decoding.
Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z) - Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences [67.37544997614646]
We present the first study on unsupervised spoken constituency parsing.
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees.
We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
arXiv Detail & Related papers (2023-03-15T17:57:22Z) - DMRST: A Joint Framework for Document-Level Multilingual RST Discourse
Segmentation and Parsing [24.986030179701405]
We propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly.
Our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
arXiv Detail & Related papers (2021-10-09T09:15:56Z) - Sparse Fuzzy Attention for Structured Sentiment Analysis [48.69930912510414]
We propose a sparse and fuzzy attention scorer with pooling layers which improves performance and sets the new state-of-the-art on structured sentiment analysis.
We further explore the parsing modeling on structured sentiment analysis with second-order parsing and introduce a novel sparse second-order edge building procedure that leads to significant improvement in parsing performance.
arXiv Detail & Related papers (2021-09-14T14:37:56Z) - A Conditional Splitting Framework for Efficient Constituency Parsing [14.548146390081778]
We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions.
Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding.
For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
arXiv Detail & Related papers (2021-06-30T00:36:34Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.