OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset
with Visual Contexts
- URL: http://arxiv.org/abs/2109.12761v2
- Date: Tue, 28 Sep 2021 15:15:57 GMT
- Title: OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset
with Visual Contexts
- Authors: Shuhe Wang, Yuxian Meng, Xiaoya Li, Xiaofei Sun, Rongbin Ouyang, Jiwei
Li
- Abstract summary: We release OpenViDial 2.0, a larger-scale open-domain multi-modal dialogue dataset.
OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series.
- Score: 20.37658842432543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to better simulate the real human conversation process, models need
to generate dialogue utterances based on not only preceding textual contexts
but also visual contexts. However, with the development of multi-modal dialogue
learning, the dataset scale gradually becomes a bottleneck. In this report, we
release OpenViDial 2.0, a larger-scale open-domain multi-modal dialogue dataset
compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a
total number of 5.6 million dialogue turns extracted from either movies or TV
series from different resources, and each dialogue turn is paired with its
corresponding visual context. We hope this large-scale dataset can help
facilitate future researches on open-domain multi-modal dialog generation,
e.g., multi-modal pretraining for dialogue generation.
Related papers
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - DialogStudio: Towards Richest and Most Diverse Unified Dataset
Collection for Conversational AI [92.29874802394167]
DialogStudio is the largest and most diverse collection of dialogue datasets.
Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues.
arXiv Detail & Related papers (2023-07-19T17:57:53Z) - Which One Are You Referring To? Multimodal Object Identification in
Situated Dialogue [50.279206765971125]
We explore three methods to tackle the problem of interpreting multimodal inputs from conversational and situational contexts.
Our best method, scene-dialogue alignment, improves the performance by 20% F1-score compared to the SIMMC 2.1 baselines.
arXiv Detail & Related papers (2023-02-28T15:45:20Z) - TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
World [97.58623810402563]
We introduce a new video-based multi-modal dialogue dataset, called TikTalk.
We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them.
Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context.
arXiv Detail & Related papers (2023-01-14T10:18:22Z) - MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
Open-domain Conversation [68.53133207668856]
We introduce the MMDialog dataset to better facilitate multi-modal conversation.
MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics.
To build engaging dialogue system with this dataset, we propose and normalize two response producing tasks.
arXiv Detail & Related papers (2022-11-10T17:37:04Z) - What Did You Say? Task-Oriented Dialog Datasets Are Not Conversational!? [4.022057598291766]
We outline a taxonomy of conversational and contextual effects, which we use to examine MultiWOZ, SGD and SMCalFlow.
We find that less than 4% of MultiWOZ's turns and 10% of SGD's turns are conversational, while SMCalFlow is not conversational at all in its current release.
arXiv Detail & Related papers (2022-03-07T14:26:23Z) - Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation [35.45552689723718]
We propose frameworks to resolve a specific case of multi-modal dialog generation in the real world.
Specifically, we propose to model the mutual dependency between text-visual features.
We observe significant performance boosts over vanilla models when the mutual dependency between text and visual features is modeled.
arXiv Detail & Related papers (2021-05-30T07:20:28Z) - OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
Contexts [35.57757367869986]
We release bf OpenViDial, a large-scale multi- module dialogue dataset.
OpenViDial contains a total number of 1.1 million dialogue turns.
We propose a family of encoder-decoder models leveraging both textual and visual contexts.
arXiv Detail & Related papers (2020-12-30T03:02:50Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z) - Paraphrase Augmented Task-Oriented Dialog Generation [68.1790912977053]
We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model.
We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels.
arXiv Detail & Related papers (2020-04-16T05:12:36Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.