MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable
Distant Sentiment Supervision
- URL: http://arxiv.org/abs/2011.03017v1
- Date: Thu, 5 Nov 2020 18:22:38 GMT
- Title: MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable
Distant Sentiment Supervision
- Authors: Patrick Huber and Giuseppe Carenini
- Abstract summary: We present a novel methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets.
Our approach generates trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient beam-search strategy.
Experiments indicate that a discourse trained on our MEGA-DT treebank delivers promising inter-domain performance gains.
- Score: 30.615883375573432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of large and diverse discourse treebanks hinders the application of
data-driven approaches, such as deep-learning, to RST-style discourse parsing.
In this work, we present a novel scalable methodology to automatically generate
discourse treebanks using distant supervision from sentiment-annotated
datasets, creating and publishing MEGA-DT, a new large-scale
discourse-annotated corpus. Our approach generates discourse trees
incorporating structure and nuclearity for documents of arbitrary length by
relying on an efficient heuristic beam-search strategy, extended with a
stochastic component. Experiments on multiple datasets indicate that a
discourse parser trained on our MEGA-DT treebank delivers promising
inter-domain performance gains when compared to parsers trained on
human-annotated discourse corpora.
Related papers
- Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Topic-driven Distant Supervision Framework for Macro-level Discourse
Parsing [72.14449502499535]
The task of analyzing the internal rhetorical structure of texts is a challenging problem in natural language processing.
Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle.
Recent studies have attempted to overcome this limitation by using distant supervision.
arXiv Detail & Related papers (2023-05-23T07:13:51Z) - LongFNT: Long-form Speech Recognition with Factorized Neural Transducer [64.75547712366784]
We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor.
The effectiveness of our LongFNT approach is validated on LibriSpeech and GigaSpeech corpora with 19% and 12% relative word error rate(WER) reduction, respectively.
arXiv Detail & Related papers (2022-11-17T08:48:27Z) - Large Discourse Treebanks from Scalable Distant Supervision [30.615883375573432]
We propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.
"Silver-standard" discourse trees are trained on larger, more diverse and domain-independent datasets.
arXiv Detail & Related papers (2022-10-18T03:33:43Z) - Predicting Above-Sentence Discourse Structure using Distant Supervision
from Topic Segmentation [8.688675709130289]
RST-style discourse parsing plays a vital role in many NLP tasks.
Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets.
arXiv Detail & Related papers (2021-12-12T10:16:45Z) - Unsupervised Learning of Discourse Structures using a Tree Autoencoder [8.005512864082126]
We propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective.
The proposed approach can be applied to any tree objective, such as syntactic parsing, discourse parsing and others.
In this paper we are inferring general tree structures of natural text in multiple domains, showing promising results on a diverse set of tasks.
arXiv Detail & Related papers (2020-12-17T08:40:34Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Unleashing the Power of Neural Discourse Parsers -- A Context and
Structure Aware Approach Using Large Scale Pretraining [26.517219486173598]
RST-based discourse parsing is an important NLP task with numerous downstream applications, such as summarization, machine translation and opinion mining.
In this paper, we demonstrate a simple, yet highly accurate discourse parsing, incorporating recent contextual language models.
Our establishes the new state-of-the-art (SOTA) performance for predicting structure and nuclearity on two key RST datasets, RST-DT and Instr-DT.
arXiv Detail & Related papers (2020-11-06T06:11:26Z) - From Sentiment Annotations to Sentiment Prediction through Discourse
Augmentation [30.615883375573432]
We propose a novel framework to exploit task-related discourse for the task of sentiment analysis.
More specifically, we are combining the large-scale, sentiment-dependent MEGA-DT treebank with a novel neural architecture for sentiment prediction.
Experiments show that our framework using sentiment-related discourse augmentations for sentiment prediction enhances the overall performance for long documents.
arXiv Detail & Related papers (2020-11-05T18:28:13Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.