Predicting Above-Sentence Discourse Structure using Distant Supervision
from Topic Segmentation
- URL: http://arxiv.org/abs/2112.06196v1
- Date: Sun, 12 Dec 2021 10:16:45 GMT
- Title: Predicting Above-Sentence Discourse Structure using Distant Supervision
from Topic Segmentation
- Authors: Patrick Huber, Linzi Xing and Giuseppe Carenini
- Abstract summary: RST-style discourse parsing plays a vital role in many NLP tasks.
Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets.
- Score: 8.688675709130289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RST-style discourse parsing plays a vital role in many NLP tasks, revealing
the underlying semantic/pragmatic structure of potentially complex and diverse
documents. Despite its importance, one of the most prevailing limitations in
modern day discourse parsing is the lack of large-scale datasets. To overcome
the data sparsity issue, distantly supervised approaches from tasks like
sentiment analysis and summarization have been recently proposed. Here, we
extend this line of research by exploiting distant supervision from topic
segmentation, which can arguably provide a strong and oftentimes complementary
signal for high-level discourse structures. Experiments on two human-annotated
discourse treebanks confirm that our proposal generates accurate tree
structures on sentence and paragraph level, consistently outperforming previous
distantly supervised models on the sentence-to-document task and occasionally
reaching even higher scores on the sentence-to-paragraph level.
Related papers
- Topic-driven Distant Supervision Framework for Macro-level Discourse
Parsing [72.14449502499535]
The task of analyzing the internal rhetorical structure of texts is a challenging problem in natural language processing.
Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle.
Recent studies have attempted to overcome this limitation by using distant supervision.
arXiv Detail & Related papers (2023-05-23T07:13:51Z) - Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue:
An Empirical Study [51.079100495163736]
This paper systematically inspects ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing.
ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations.
Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures.
arXiv Detail & Related papers (2023-05-15T07:14:41Z) - Large Discourse Treebanks from Scalable Distant Supervision [30.615883375573432]
We propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.
"Silver-standard" discourse trees are trained on larger, more diverse and domain-independent datasets.
arXiv Detail & Related papers (2022-10-18T03:33:43Z) - Learning to Selectively Learn for Weakly-supervised Paraphrase
Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data.
Specifically, we tackle the weakly-supervised paraphrase generation problem by:.
obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion.
We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Long Text Generation by Modeling Sentence-Level and Discourse-Level
Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process.
Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z) - An End-to-End Document-Level Neural Discourse Parser Exploiting
Multi-Granularity Representations [24.986030179701405]
We exploit robust representations derived from multiple levels of granularity across syntax and semantics.
We incorporate such representations in an end-to-end encoder-decoder neural architecture for more resourceful discourse processing.
arXiv Detail & Related papers (2020-12-21T08:01:04Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Unleashing the Power of Neural Discourse Parsers -- A Context and
Structure Aware Approach Using Large Scale Pretraining [26.517219486173598]
RST-based discourse parsing is an important NLP task with numerous downstream applications, such as summarization, machine translation and opinion mining.
In this paper, we demonstrate a simple, yet highly accurate discourse parsing, incorporating recent contextual language models.
Our establishes the new state-of-the-art (SOTA) performance for predicting structure and nuclearity on two key RST datasets, RST-DT and Instr-DT.
arXiv Detail & Related papers (2020-11-06T06:11:26Z) - MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable
Distant Sentiment Supervision [30.615883375573432]
We present a novel methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets.
Our approach generates trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient beam-search strategy.
Experiments indicate that a discourse trained on our MEGA-DT treebank delivers promising inter-domain performance gains.
arXiv Detail & Related papers (2020-11-05T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.