TIMEDIAL: Temporal Commonsense Reasoning in Dialog
- URL: http://arxiv.org/abs/2106.04571v1
- Date: Tue, 8 Jun 2021 17:59:21 GMT
- Title: TIMEDIAL: Temporal Commonsense Reasoning in Dialog
- Authors: Lianhui Qin, Aditya Gupta, Shyam Upadhyay, Luheng He, Yejin Choi and
Manaal Faruqui
- Abstract summary: We present the first study to investigate pre-trained language models for their temporal reasoning capabilities in dialogs.
We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K carefully curated dialogs.
Empirical results demonstrate that even the best performing models struggle on this task compared to humans.
- Score: 43.24596551545824
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Everyday conversations require understanding everyday events, which in turn,
requires understanding temporal commonsense concepts interwoven with those
events. Despite recent progress with massive pre-trained language models (LMs)
such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains
largely under-explored. In this paper, we present the first study to
investigate pre-trained LMs for their temporal reasoning capabilities in
dialogs by introducing a new task and a crowd-sourced English challenge set,
TIMEDIAL. We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K
carefully curated dialogs. Empirical results demonstrate that even the best
performing models struggle on this task compared to humans, with 23 absolute
points of gap in accuracy. Furthermore, our analysis reveals that the models
fail to reason about dialog context correctly; instead, they rely on shallow
cues based on existing temporal patterns in context, motivating future research
for modeling temporal concepts in text and robust contextual reasoning about
them. The dataset is publicly available at:
https://github.com/google-research-datasets/timedial.
Related papers
- Language Models Still Struggle to Zero-shot Reason about Time Series [11.764833497297493]
Time series are critical for decision-making in fields like finance and healthcare.
It remains unknown whether non-trivial forecasting implies that language models can reason about time series.
We generate a first-of-its-kind evaluation framework for time series reasoning.
arXiv Detail & Related papers (2024-04-17T21:27:33Z) - Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues.
We equip each agent with the capability of sharing and reacting to images.
The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z) - Mind the Gap Between Conversations for Improved Long-Term Dialogue
Generation [21.109006148673846]
GapChat is a multi-session dialogue dataset in which the time between each session varies.
While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan.
We show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.
arXiv Detail & Related papers (2023-10-24T00:12:38Z) - Conversation Chronicles: Towards Diverse Temporal and Relational
Dynamics in Multi-Session Conversations [9.249662593315541]
We introduce a new 1M multi-session dialogue dataset, Conversation Chronicles, for implementing a long-term conversation setup.
We show that dialogue episodes in Conversation Chronicles reflect those properties while maintaining coherent and consistent interactions.
We also propose a dialogue model, called ReBot, which consists of chronological summarization and dialogue generation modules.
arXiv Detail & Related papers (2023-10-20T11:06:21Z) - An Overview Of Temporal Commonsense Reasoning and Acquisition [20.108317515225504]
Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events.
Recent research on the performance of large language models suggests that they often take shortcuts in their reasoning and fall prey to simple linguistic traps.
arXiv Detail & Related papers (2023-07-28T01:30:15Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal,
Causal, and Discourse Relations [52.26802326949116]
We quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations.
ChatGPT exhibits exceptional proficiency in detecting and reasoning about causal relations.
It is capable of identifying the majority of discourse relations with existing explicit discourse connectives, but the implicit discourse relation remains a formidable challenge.
arXiv Detail & Related papers (2023-04-28T13:14:36Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - OPAL: Ontology-Aware Pretrained Language Model for End-to-End
Task-Oriented Dialogue [40.62090743056549]
This paper presents an ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD)
Unlike chit-chat dialogue models, task-oriented dialogue models fulfill at least two task-specific modules: dialogue state tracker (DST) and response generator (RG)
arXiv Detail & Related papers (2022-09-10T04:38:27Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.