The Discussion Tracker Corpus of Collaborative Argumentation
- URL: http://arxiv.org/abs/2005.11344v1
- Date: Fri, 22 May 2020 18:27:28 GMT
- Title: The Discussion Tracker Corpus of Collaborative Argumentation
- Authors: Christopher Olshefski, Luca Lugini, Ravneet Singh, Diane Litman,
Amanda Godley
- Abstract summary: The Discussion Tracker corpus was collected in American high school English classes.
The corpus consists of 29 multi-party discussions of English literature transcribed from 985 minutes of audio.
- Score: 2.800857580710507
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although Natural Language Processing (NLP) research on argument mining has
advanced considerably in recent years, most studies draw on corpora of
asynchronous and written texts, often produced by individuals. Few published
corpora of synchronous, multi-party argumentation are available. The Discussion
Tracker corpus, collected in American high school English classes, is an
annotated dataset of transcripts of spoken, multi-party argumentation. The
corpus consists of 29 multi-party discussions of English literature transcribed
from 985 minutes of audio. The transcripts were annotated for three dimensions
of collaborative argumentation: argument moves (claims, evidence, and
explanations), specificity (low, medium, high) and collaboration (e.g.,
extensions of and disagreements about others' ideas). In addition to providing
descriptive statistics on the corpus, we provide performance benchmarks and
associated code for predicting each dimension separately, illustrate the use of
the multiple annotations in the corpus to improve performance via multi-task
learning, and finally discuss other ways the corpus might be used to further
NLP research.
Related papers
- Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - FRACAS: A FRench Annotated Corpus of Attribution relations in newS [0.0]
We present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution.
We first describe the composition of our corpus and the choices that were made in selecting the data.
We then detail our inter-annotator agreement between the 8 annotators who worked on manual labelling.
arXiv Detail & Related papers (2023-09-19T13:19:54Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - A Novel Corpus of Discourse Structure in Humans and Computers [55.74664144248097]
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses.
The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2.
arXiv Detail & Related papers (2021-11-10T20:56:08Z) - MIND - Mainstream and Independent News Documents Corpus [0.7347989843033033]
This paper characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources.
The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories.
arXiv Detail & Related papers (2021-08-13T14:00:12Z) - Generating Informative Conclusions for Argumentative Texts [32.3103908466811]
The purpose of an argumentative text is to support a certain conclusion.
An explicit conclusion makes for a good candidate summary of an argumentative text.
This is especially true if the conclusion is informative, emphasizing specific concepts from the text.
arXiv Detail & Related papers (2021-06-02T10:35:59Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z) - Know thy corpus! Robust methods for digital curation of Web corpora [0.0]
This paper proposes a novel framework for digital curation of Web corpora.
It provides robust estimation of their parameters, such as their composition and the lexicon.
arXiv Detail & Related papers (2020-03-13T17:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.