MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
Open-domain Conversation
- URL: http://arxiv.org/abs/2211.05719v1
- Date: Thu, 10 Nov 2022 17:37:04 GMT
- Title: MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
Open-domain Conversation
- Authors: Jiazhan Feng, Qingfeng Sun, Can Xu, Pu Zhao, Yaming Yang, Chongyang
Tao, Dongyan Zhao, Qingwei Lin
- Abstract summary: We introduce the MMDialog dataset to better facilitate multi-modal conversation.
MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics.
To build engaging dialogue system with this dataset, we propose and normalize two response producing tasks.
- Score: 68.53133207668856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Responding with multi-modal content has been recognized as an essential
capability for an intelligent conversational agent. In this paper, we introduce
the MMDialog dataset to better facilitate multi-modal conversation. MMDialog is
composed of a curated set of 1.08 million real-world dialogues with 1.53
million unique images across 4,184 topics. MMDialog has two main and unique
advantages. First, it is the largest multi-modal conversation dataset by the
number of dialogues by 8x. Second, it contains massive topics to generalize the
open-domain. To build engaging dialogue system with this dataset, we propose
and normalize two response producing tasks based on retrieval and generative
scenarios. In addition, we build two baselines for above tasks with
state-of-the-art techniques and report their experimental performance. We also
propose a novel evaluation metric MM-Relevance to measure the multi-modal
responses. Our dataset and scripts are available in
https://github.com/victorsungo/MMDialog.
Related papers
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - Which One Are You Referring To? Multimodal Object Identification in
Situated Dialogue [50.279206765971125]
We explore three methods to tackle the problem of interpreting multimodal inputs from conversational and situational contexts.
Our best method, scene-dialogue alignment, improves the performance by 20% F1-score compared to the SIMMC 2.1 baselines.
arXiv Detail & Related papers (2023-02-28T15:45:20Z) - Dialog Inpainting: Turning Documents into Dialogs [12.131506050808207]
We produce two datasets totalling 19 million diverse information-seeking dialogs.
Human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets.
arXiv Detail & Related papers (2022-05-18T16:58:50Z) - MSCTD: A Multimodal Sentiment Chat Translation Dataset [66.81525961469494]
We introduce a new task named Multimodal Chat Translation (MCT)
MCT aims to generate more accurate translations with the help of the associated dialogue history and visual context.
Our work can facilitate research on both multimodal chat translation and multimodal dialogue sentiment analysis.
arXiv Detail & Related papers (2022-02-28T09:40:46Z) - OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset
with Visual Contexts [20.37658842432543]
We release OpenViDial 2.0, a larger-scale open-domain multi-modal dialogue dataset.
OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series.
arXiv Detail & Related papers (2021-09-27T02:10:29Z) - Fusing task-oriented and open-domain dialogues in conversational agents [12.338220374261343]
Two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant.
Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues.
It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other.
arXiv Detail & Related papers (2021-09-09T09:48:26Z) - Towards Conversational Recommendation over Multi-Type Dialogs [78.52354759386296]
We propose a new task of conversational recommendation over multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog to a recommendation dialog.
To facilitate the study of this task, we create a human-to-human Chinese dialog dataset emphDuRecDial (about 10k dialogs, 156k utterances)
In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior.
arXiv Detail & Related papers (2020-05-08T11:01:21Z) - UniConv: A Unified Conversational Neural Architecture for Multi-domain
Task-oriented Dialogues [101.96097419995556]
"UniConv" is a novel unified neural architecture for end-to-end conversational systems in task-oriented dialogues.
We conduct comprehensive experiments in dialogue state tracking, context-to-text, and end-to-end settings on the MultiWOZ2.1 benchmark.
arXiv Detail & Related papers (2020-04-29T16:28:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.