ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization
- URL: http://arxiv.org/abs/2410.13667v1
- Date: Thu, 17 Oct 2024 15:28:27 GMT
- Title: ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization
- Authors: Xiutian Zhao, Ke Wang, Wei Peng,
- Abstract summary: We present ORCHID (Oral Chinese Debate), the first Chinese dataset for benchmarking target-independent stance detection and debate summarization.
Our dataset consists of 1,218 real-world debates that were conducted in Chinese on 476 unique topics, containing 2,436 stance-specific summaries and 14,133 fully annotated utterances.
The results show the challenging nature of the dataset and suggest a potential of incorporating stance detection in summarization for argumentative dialogue.
- Score: 6.723531714964794
- License:
- Abstract: Dialogue agents have been receiving increasing attention for years, and this trend has been further boosted by the recent progress of large language models (LLMs). Stance detection and dialogue summarization are two core tasks of dialogue agents in application scenarios that involve argumentative dialogues. However, research on these tasks is limited by the insufficiency of public datasets, especially for non-English languages. To address this language resource gap in Chinese, we present ORCHID (Oral Chinese Debate), the first Chinese dataset for benchmarking target-independent stance detection and debate summarization. Our dataset consists of 1,218 real-world debates that were conducted in Chinese on 476 unique topics, containing 2,436 stance-specific summaries and 14,133 fully annotated utterances. Besides providing a versatile testbed for future research, we also conduct an empirical study on the dataset and propose an integrated task. The results show the challenging nature of the dataset and suggest a potential of incorporating stance detection in summarization for argumentative dialogue.
Related papers
- CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization [7.234196390284036]
This article summarizes the research on Transformer-based abstractive summarization for English dialogues.
We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality)
We find that while some challenges, like language, have seen considerable progress, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities.
arXiv Detail & Related papers (2024-06-11T17:30:22Z) - JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset [3.1311340484197814]
JMultiWOZ is the first Japanese language large-scale multi-domain task-oriented dialogue dataset.
We evaluated the dialogue state tracking and response generation capabilities of the state-of-the-art methods.
arXiv Detail & Related papers (2024-03-26T02:01:18Z) - FREDSum: A Dialogue Summarization Corpus for French Political Debates [26.76383031532945]
We present a dataset of French political debates for the purpose of enhancing resources for multi-lingual dialogue summarization.
Our dataset consists of manually transcribed and annotated political debates, covering a range of topics and perspectives.
arXiv Detail & Related papers (2023-12-08T05:42:04Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - MD3: The Multi-Dialect Dataset of Dialogues [20.144004030947507]
We introduce a new dataset of conversational speech representing English from India, Nigeria, and the United States.
The dataset includes more than 20 hours of audio and more than 200,000 orthographically-transcribed tokens.
arXiv Detail & Related papers (2023-05-19T00:14:10Z) - Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark [10.378163772785204]
We propose a teacher-student framework based on hierarchical contrastive learning to predict the topic shift without the response.
The experimental results on our Chinese CNTD and English TIAGE show the effectiveness of our proposed model.
arXiv Detail & Related papers (2023-05-02T04:03:50Z) - DiaASQ : A Benchmark of Conversational Aspect-based Sentiment Quadruple
Analysis [84.80347062834517]
We introduce DiaASQ, aiming to detect the quadruple of target-aspect-opinion-sentiment in a dialogue.
We manually construct a large-scale high-quality DiaASQ dataset in both Chinese and English languages.
We develop a neural model to benchmark the task, which advances in effectively performing end-to-end quadruple prediction.
arXiv Detail & Related papers (2022-11-10T17:18:20Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.