DiscSense: Automated Semantic Analysis of Discourse Markers
- URL: http://arxiv.org/abs/2006.01603v1
- Date: Tue, 2 Jun 2020 13:39:53 GMT
- Title: DiscSense: Automated Semantic Analysis of Discourse Markers
- Authors: Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller
- Abstract summary: We study the link between discourse markers and the semantic relations annotated in classification datasets.
By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English.
- Score: 9.272765183222967
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or
phrases that are used to signal semantic and/or pragmatic relationships between
clauses or sentences. Recent work has fruitfully explored the prediction of
discourse markers between sentence pairs in order to learn accurate sentence
representations, that are useful in various classification tasks. In this work,
we take another perspective: using a model trained to predict discourse markers
between sentence pairs, we predict plausible markers between sentence pairs
with a known semantic relation (provided by existing classification datasets).
These predictions allow us to study the link between discourse markers and the
semantic relations annotated in classification datasets. Handcrafted mappings
have been proposed between markers and discourse relations on a limited set of
markers and a limited set of categories, but there exist hundreds of discourse
markers expressing a wide variety of relations, and there is no consensus on
the taxonomy of relations between competing discourse theories (which are
largely built in a top-down fashion). By using an automatic rediction method
over existing semantically annotated datasets, we provide a bottom-up
characterization of discourse markers in English. The resulting dataset, named
DiscSense, is publicly available.
Related papers
- Multi-Label Classification for Implicit Discourse Relation Recognition [10.280148603465697]
We explore various multi-label classification frameworks to handle implicit discourse relation recognition.
We show that multi-label classification methods don't depress performance for single-label prediction.
arXiv Detail & Related papers (2024-06-06T19:37:25Z) - Distributed Marker Representation for Ambiguous Discourse Markers and
Entangled Relations [50.31129784616845]
We learn a Distributed Marker Representation (DMR) by utilizing the unlimited discourse marker data with a latent discourse sense.
Our method also offers a valuable tool to understand complex ambiguity and entanglement among discourse markers and manually defined discourse relations.
arXiv Detail & Related papers (2023-06-19T00:49:51Z) - Cross-Genre Argument Mining: Can Language Models Automatically Fill in
Missing Discourse Markers? [17.610382230820395]
We propose to automatically augment a given text with discourse markers such that all relations are explicitly signaled.
Our analysis unveils that popular language models taken out-of-the-box fail on this task.
We demonstrate the impact of our approach on an Argument Mining downstream task, evaluated on different corpora.
arXiv Detail & Related papers (2023-06-07T10:19:50Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Active Learning and Multi-label Classification for Ellipsis and
Coreference Detection in Conversational Question-Answering [5.984693203400407]
ellipsis and coreferences are commonly occurring linguistic phenomena.
We propose to use a multi-label classifier based on DistilBERT.
We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset.
arXiv Detail & Related papers (2022-07-07T08:14:54Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Discourse Parsing of Contentious, Non-Convergent Online Discussions [0.16311150636417257]
Inspired by the Bakhtinian theory of Dialogism, we propose a novel theoretical and computational framework.
We develop a novel discourse annotation schema which reflects a hierarchy of discursive strategies.
We share the first labeled dataset of contentious non-convergent online discussions.
arXiv Detail & Related papers (2020-12-08T17:36:39Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.