VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
- URL: http://arxiv.org/abs/2309.07387v1
- Date: Thu, 14 Sep 2023 02:09:20 GMT
- Title: VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
- Authors: Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing
Long, Min Yang, Fei Huang, Yongbin Li
- Abstract summary: We propose textbfVDialogUE, a textbfVisually-grounded textbfDialogue benchmark for textbfUnified textbfEvaluation.
It defines five core multi-modal dialogue tasks and covers six datasets.
We also present a straightforward yet efficient baseline model, named textbfVISIT(textbfVISually-grounded dtextbfIalog textbfTransformer), to promote the advancement of
- Score: 70.64560638766018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visually-grounded dialog systems, which integrate multiple modes of
communication such as text and visual inputs, have become an increasingly
popular area of investigation. However, the absence of a standardized
evaluation framework poses a challenge in assessing the development of this
field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded
\textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It
defines five core multi-modal dialogue tasks and covers six datasets.
Furthermore, in order to provide a comprehensive assessment of the model's
performance across all tasks, we developed a novel evaluation metric called
VDscore, which is based on the Analytic Hierarchy Process~(AHP) method.
Additionally, we present a straightforward yet efficient baseline model, named
\textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog
\textbf{T}ransformer), to promote the advancement of general multi-modal
dialogue systems. It progressively builds its multi-modal foundation and
dialogue capability via a two-stage pre-training strategy.
We believe that the VDialogUE benchmark, along with the evaluation scripts
and our baseline models, will accelerate the development of visually-grounded
dialog systems and lead to the development of more sophisticated and effective
pre-trained models.
Related papers
- FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z) - Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with
User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems.
We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues.
Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
Act Flows [63.116280145770006]
We propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
To utilize segment act flows, sequences of segment acts, for evaluation, we develop the first consensus-based dialogue evaluation framework, FlowEval.
arXiv Detail & Related papers (2022-02-14T11:37:20Z) - GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with
Semi-Supervised Learning and Explicit Policy Injection [36.77204909711832]
We propose a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora.
Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation.
Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems.
arXiv Detail & Related papers (2021-11-29T15:24:36Z) - CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking [44.38388988238695]
A dialogue state tracker aims to accurately find a compact representation of the current dialogue status.
We employ a structured state representation and cast dialogue state tracking as a sequence generation problem.
Experiments demonstrate our tracker achieves encouraging joint goal accuracy for the five domains in MultiWOZ 2.0 and MultiWOZ 2.1 datasets.
arXiv Detail & Related papers (2020-09-22T10:27:18Z) - Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical
Analysis of System-wise Evaluation [114.48767388174218]
This paper presents an empirical analysis on different types of dialog systems composed of different modules in different settings.
Our results show that a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels.
arXiv Detail & Related papers (2020-05-15T05:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.