ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers
- URL: http://arxiv.org/abs/2202.06690v1
- Date: Mon, 14 Feb 2022 13:27:19 GMT
- Title: ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers
- Authors: Federico Ruggeri, Mohsen Mesgar, Iryna Gurevych
- Abstract summary: We introduce a novel framework to collect dialogues between scientists as domain experts on scientific papers.
Our framework lets scientists present their scientific papers as groundings for dialogues and participate in dialogue they like its paper title.
We use our framework to collect a novel argumentative dialogue dataset, ArgSciChat. It consists of 498 messages collected from 41 dialogues on 20 scientific papers.
- Score: 61.772582143035606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The applications of conversational agents for scientific disciplines (as
expert domains) are understudied due to the lack of dialogue data to train such
agents. While most data collection frameworks, such as Amazon Mechanical Turk,
foster data collection for generic domains by connecting crowd workers and task
designers, these frameworks are not much optimized for data collection in
expert domains. Scientists are rarely present in these frameworks due to their
limited time budget. Therefore, we introduce a novel framework to collect
dialogues between scientists as domain experts on scientific papers. Our
framework lets scientists present their scientific papers as groundings for
dialogues and participate in dialogue they like its paper title. We use our
framework to collect a novel argumentative dialogue dataset, ArgSciChat. It
consists of 498 messages collected from 41 dialogues on 20 scientific papers.
Alongside extensive analysis on ArgSciChat, we evaluate a recent conversational
agent on our dataset. Experimental results show that this agent poorly performs
on ArgSciChat, motivating further research on argumentative scientific agents.
We release our framework and the dataset.
Related papers
- cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers [5.103692331918768]
This work introduces Conversational Papers (cPAPERS), a dataset of conversational question-answer pairs from reviews of academic papers.
We present a data collection strategy to collect these question-answer pairs from OpenReview and associate them with contextual information from source files.
arXiv Detail & Related papers (2024-06-12T16:46:12Z) - SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [20.994565065595232]
We present a new corpus to facilitate the automated generation of scientific news reports.
Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.
We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z) - FREDSum: A Dialogue Summarization Corpus for French Political Debates [26.76383031532945]
We present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization.
Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives.
arXiv Detail & Related papers (2023-12-08T05:42:04Z) - Does Collaborative Human-LM Dialogue Generation Help Information
Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections.
We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - NatCS: Eliciting Natural Customer Support Dialogues [5.398732055835996]
Existing task-oriented dialogue datasets are not representative of real customer support conversations.
We introduce NatCS, a multi-domain collection of spoken customer service conversations.
arXiv Detail & Related papers (2023-05-04T17:25:24Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - Graph Based Network with Contextualized Representations of Turns in
Dialogue [0.0]
Dialogue-based relation extraction (RE) aims to extract relation(s) between two arguments that appear in a dialogue.
We propose the TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN) modeled by paying attention to the way people understand dialogues.
arXiv Detail & Related papers (2021-09-09T03:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.