ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain
Dialogue Systems
- URL: http://arxiv.org/abs/2305.07797v2
- Date: Fri, 3 Nov 2023 05:06:06 GMT
- Title: ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain
Dialogue Systems
- Authors: Sarik Ghazarian, Yijia Shao, Rujun Han, Aram Galstyan, Nanyun Peng
- Abstract summary: We propose ACCENT, an event commonsense evaluation empowered by commonsense knowledge bases (CSKBs)
Our experiments show that ACCENT is an efficient metric for event commonsense evaluation, which achieves higher correlations with human judgments than existing baselines.
- Score: 81.8658402934838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonsense reasoning is omnipresent in human communications and thus is an
important feature for open-domain dialogue systems. However, evaluating
commonsense in dialogue systems is still an open challenge. We take the first
step by focusing on event commonsense that considers events and their
relations, and is crucial in both dialogues and general commonsense reasoning.
We propose ACCENT, an event commonsense evaluation metric empowered by
commonsense knowledge bases (CSKBs). ACCENT first extracts event-relation
tuples from a dialogue, and then evaluates the response by scoring the tuples
in terms of their compatibility with the CSKB. To evaluate ACCENT, we construct
the first public event commonsense evaluation dataset for open-domain
dialogues. Our experiments show that ACCENT is an efficient metric for event
commonsense evaluation, which achieves higher correlations with human judgments
than existing baselines.
Related papers
- CausalScore: An Automatic Reference-Free Metric for Assessing Response Relevance in Open-Domain Dialogue Systems [43.5428962271088]
We propose a novel metric, called CausalScore, which assesses the relevance of responses by measuring the causal strength between dialogue histories and responses.
Our experimental results demonstrate that CausalScore significantly surpasses existing state-of-the-art metrics by aligning better with human judgements.
arXiv Detail & Related papers (2024-06-25T06:08:16Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
Assessment [38.26039323208791]
We release a large-scale dialogue quality assessment dataset (DiQAD) for automatically assessing open-domain dialogue quality.
Specifically, we establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities.
We also annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues.
arXiv Detail & Related papers (2023-10-25T03:04:57Z) - Evaluating Open-Domain Dialogues in Latent Space with Next Sentence
Prediction and Mutual Information [18.859159491548006]
We propose a novel learning-based automatic evaluation metric (CMN) for open-domain dialogues.
We employ Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space.
Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines.
arXiv Detail & Related papers (2023-05-26T14:21:54Z) - FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows [63.116280145770006]
We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
arXiv Detail & Related papers (2022-02-14T11:37:20Z) - User Response and Sentiment Prediction for Automatic Dialogue Evaluation [69.11124655437902]
We propose to use the sentiment of the next user utterance for turn or dialog level evaluation.
Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
arXiv Detail & Related papers (2021-11-16T22:19:17Z) - Commonsense-Focused Dialogues for Response Generation: An Empirical
Study [39.49727190159279]
We present an empirical study of commonsense in dialogue response generation.
We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet.
We then collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting.
arXiv Detail & Related papers (2021-09-14T04:32:09Z) - Is this Dialogue Coherent? Learning from Dialogue Acts and Entities [82.44143808977209]
We create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings.
Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities.
We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.
arXiv Detail & Related papers (2020-06-17T21:02:40Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.