ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
Causal, and Discourse Relations
- URL: http://arxiv.org/abs/2304.14827v3
- Date: Fri, 26 Jan 2024 10:33:08 GMT
- Title: ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
Causal, and Discourse Relations
- Authors: Chunkit Chan, Jiayang Cheng, Weiqi Wang, Yuxin Jiang, Tianqing Fang,
Xin Liu, Yangqiu Song
- Abstract summary: We quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations.
ChatGPT exhibits exceptional proficiency in detecting and reasoning about causal relations.
It is capable of identifying the majority of discourse relations with existing explicit discourse connectives, but the implicit discourse relation remains a formidable challenge.
- Score: 52.26802326949116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to quantitatively evaluate the performance of ChatGPT, an
interactive large language model, on inter-sentential relations such as
temporal relations, causal relations, and discourse relations. Given ChatGPT's
promising performance across various tasks, we proceed to carry out thorough
evaluations on the whole test sets of 11 datasets, including temporal and
causal relations, PDTB2.0-based, and dialogue-based discourse relations. To
ensure the reliability of our findings, we employ three tailored prompt
templates for each task, including the zero-shot prompt template, zero-shot
prompt engineering (PE) template, and in-context learning (ICL) prompt
template, to establish the initial baseline scores for all popular
sentence-pair relation classification tasks for the first time. Through our
study, we discover that ChatGPT exhibits exceptional proficiency in detecting
and reasoning about causal relations, albeit it may not possess the same level
of expertise in identifying the temporal order between two events. While it is
capable of identifying the majority of discourse relations with existing
explicit discourse connectives, the implicit discourse relation remains a
formidable challenge. Concurrently, ChatGPT demonstrates subpar performance in
the dialogue discourse parsing task that requires structural understanding in a
dialogue before being aware of the discourse relation.
Related papers
- Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue:
An Empirical Study [51.079100495163736]
This paper systematically inspects ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing.
ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations.
Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures.
arXiv Detail & Related papers (2023-05-15T07:14:41Z) - A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding [55.37338324658501]
Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data.
In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks.
arXiv Detail & Related papers (2023-04-09T15:28:36Z) - Learning to Memorize Entailment and Discourse Relations for
Persona-Consistent Dialogues [8.652711997920463]
Existing works have improved the performance of dialogue systems by intentionally learning interlocutor personas with sophisticated network structures.
This study proposes a method of learning to memorize entailment and discourse relations for persona-consistent dialogue tasks.
arXiv Detail & Related papers (2023-01-12T08:37:00Z) - Multi-tasking Dialogue Comprehension with Discourse Parsing [43.352833140317486]
We propose the first multi-task model for jointly performing QA and discourse parsing (DP) on the multi-party dialogue MRC task.
Our results indicate that training with complementary tasks indeed benefits not only QA task, but also DP task itself.
arXiv Detail & Related papers (2021-10-07T08:51:49Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - TIMEDIAL: Temporal Commonsense Reasoning in Dialog [43.24596551545824]
We present the first study to investigate pre-trained language models for their temporal reasoning capabilities in dialogs.
We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K carefully curated dialogs.
Empirical results demonstrate that even the best performing models struggle on this task compared to humans.
arXiv Detail & Related papers (2021-06-08T17:59:21Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z) - DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic
Dialogues [11.531187569461489]
This paper proposes the task of relation classification of interlocutors based on their dialogues.
We crawled movie scripts from IMSDb, and annotated the relation labels for each session according to 13 pre-defined relationships.
The annotated dataset DDRel consists of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126 utterances in total.
arXiv Detail & Related papers (2020-12-04T12:30:31Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.