Large Discourse Treebanks from Scalable Distant Supervision
- URL: http://arxiv.org/abs/2212.06038v1
- Date: Tue, 18 Oct 2022 03:33:43 GMT
- Title: Large Discourse Treebanks from Scalable Distant Supervision
- Authors: Patrick Huber and Giuseppe Carenini
- Abstract summary: We propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.
"Silver-standard" discourse trees are trained on larger, more diverse and domain-independent datasets.
- Score: 30.615883375573432
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Discourse parsing is an essential upstream task in Natural Language
Processing with strong implications for many real-world applications. Despite
its widely recognized role, most recent discourse parsers (and consequently
downstream tasks) still rely on small-scale human-annotated discourse
treebanks, trying to infer general-purpose discourse structures from very
limited data in a few narrow domains. To overcome this dire situation and allow
discourse parsers to be trained on larger, more diverse and domain-independent
datasets, we propose a framework to generate "silver-standard" discourse trees
from distant supervision on the auxiliary task of sentiment analysis.
Related papers
- Topic-driven Distant Supervision Framework for Macro-level Discourse
Parsing [72.14449502499535]
The task of analyzing the internal rhetorical structure of texts is a challenging problem in natural language processing.
Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle.
Recent studies have attempted to overcome this limitation by using distant supervision.
arXiv Detail & Related papers (2023-05-23T07:13:51Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Unsupervised Inference of Data-Driven Discourse Structures using a Tree
Auto-Encoder [30.615883375573432]
We propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective.
The proposed approach can be applied to any tree-structured objective, such as syntactic parsing, discourse parsing and others.
arXiv Detail & Related papers (2022-10-18T03:28:39Z) - Unsupervised Learning of Hierarchical Conversation Structure [50.29889385593043]
Goal-oriented conversations often have meaningful sub-dialogue structure, but it can be highly domain-dependent.
This work introduces an unsupervised approach to learning hierarchical conversation structure, including turn and sub-dialogue segment labels.
The decoded structure is shown to be useful in enhancing neural models of language for three conversation-level understanding tasks.
arXiv Detail & Related papers (2022-05-24T17:52:34Z) - Predicting Above-Sentence Discourse Structure using Distant Supervision
from Topic Segmentation [8.688675709130289]
RST-style discourse parsing plays a vital role in many NLP tasks.
Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets.
arXiv Detail & Related papers (2021-12-12T10:16:45Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - Randomized Deep Structured Prediction for Discourse-Level Processing [45.725437752821655]
Expressive text encoders have been at the center of NLP models in recent work.
We show that we can efficiently leverage deep structured prediction and expressive neural encoders for a set of tasks involving complicated argumentative structures.
arXiv Detail & Related papers (2021-01-25T21:49:32Z) - Unsupervised Learning of Discourse Structures using a Tree Autoencoder [8.005512864082126]
We propose a new strategy to generate tree structures in a task-agnostic, unsupervised fashion by extending a latent tree induction framework with an auto-encoding objective.
The proposed approach can be applied to any tree objective, such as syntactic parsing, discourse parsing and others.
In this paper we are inferring general tree structures of natural text in multiple domains, showing promising results on a diverse set of tasks.
arXiv Detail & Related papers (2020-12-17T08:40:34Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable
Distant Sentiment Supervision [30.615883375573432]
We present a novel methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets.
Our approach generates trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient beam-search strategy.
Experiments indicate that a discourse trained on our MEGA-DT treebank delivers promising inter-domain performance gains.
arXiv Detail & Related papers (2020-11-05T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.