RST-style Discourse Parsing Guided by Document-level Content Structures
- URL: http://arxiv.org/abs/2309.04141v1
- Date: Fri, 8 Sep 2023 05:50:27 GMT
- Title: RST-style Discourse Parsing Guided by Document-level Content Structures
- Authors: Ming Li, Ruihong Huang
- Abstract summary: Existing RST parsing pipelines construct rhetorical structures without the knowledge of document-level content structures.
We propose a novel pipeline for RST-DP that incorporates structure-aware news content sentence representations.
- Score: 27.28989421841165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores how
clauses, sentences, and large text spans compose a whole discourse and presents
the rhetorical structure as a hierarchical tree. Existing RST parsing pipelines
construct rhetorical structures without the knowledge of document-level content
structures, which causes relatively low performance when predicting the
discourse relations for large text spans. Recognizing the value of high-level
content-related information in facilitating discourse relation recognition, we
propose a novel pipeline for RST-DP that incorporates structure-aware news
content sentence representations derived from the task of News Discourse
Profiling. By incorporating only a few additional layers, this enhanced
pipeline exhibits promising performance across various RST parsing metrics.
Related papers
- From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - Document Structure in Long Document Transformers [64.76981299465885]
Long documents often exhibit structure with hierarchically organized elements of different functions, such as section headers and paragraphs.
Despite the omnipresence of document structure, its role in natural language processing (NLP) remains opaque.
Do long-document Transformer models acquire an internal representation of document structure during pre-training?
How can structural information be communicated to a model after pre-training, and how does it influence downstream performance?
arXiv Detail & Related papers (2024-01-31T08:28:06Z) - Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark [44.06803331843307]
paragraph-level topic structure can grasp and understand the overall context of a document from a higher level.
The lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained research and applications.
We propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction.
We employ a two-stage man-machine collaborative annotation method to construct the largest Chinese paragraph-level Topic Structure corpus.
arXiv Detail & Related papers (2023-05-24T06:43:23Z) - Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue:
An Empirical Study [51.079100495163736]
This paper systematically inspects ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing.
ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations.
Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures.
arXiv Detail & Related papers (2023-05-15T07:14:41Z) - DMRST: A Joint Framework for Document-Level Multilingual RST Discourse
Segmentation and Parsing [24.986030179701405]
We propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly.
Our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
arXiv Detail & Related papers (2021-10-09T09:15:56Z) - RST Parsing from Scratch [14.548146390081778]
We introduce a novel end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.
Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite.
Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees.
arXiv Detail & Related papers (2021-05-23T06:19:38Z) - Substructure Substitution: Structured Data Augmentation for NLP [55.69800855705232]
SUB2 generates new examples by substituting substructures with ones with the same label.
For more general tasks, we present variations of SUB2 based on constituency parse trees.
For most cases, training with the augmented dataset by SUB2 achieves better performance than training with the original training set.
arXiv Detail & Related papers (2021-01-02T09:54:24Z) - An End-to-End Document-Level Neural Discourse Parser Exploiting
Multi-Granularity Representations [24.986030179701405]
We exploit robust representations derived from multiple levels of granularity across syntax and semantics.
We incorporate such representations in an end-to-end encoder-decoder neural architecture for more resourceful discourse processing.
arXiv Detail & Related papers (2020-12-21T08:01:04Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical
Supervision from Extractive Summaries [46.183289748907804]
We propose SOE, a pipelined system that outlines, outlining and elaborating for long text generation.
SOE produces long texts with significantly better quality, along with faster convergence speed.
arXiv Detail & Related papers (2020-10-14T13:22:20Z) - A Top-Down Neural Architecture towards Text-Level Parsing of Discourse
Rhetorical Structure [27.927104697483934]
We propose a top-down neural architecture toward text-level DRS parsing.
We cast discourse parsing as a split point ranking task, where a split point is classified to different levels according to its rank.
In this way, we can determine the complete DRS as a hierarchical tree structure via an encoder-decoder with an internal stack.
arXiv Detail & Related papers (2020-05-06T09:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.