Open Domain Dialogue Generation with Latent Images
- URL: http://arxiv.org/abs/2004.01981v2
- Date: Tue, 1 Jun 2021 07:43:08 GMT
- Title: Open Domain Dialogue Generation with Latent Images
- Authors: Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li
- Abstract summary: We propose learning a response generation model with both image-grounded dialogues and textual dialogues.
In the first scenario, image-grounded dialogues can be effectively augmented by textual dialogues with latent images.
In the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.
- Score: 43.78366219197779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider grounding open domain dialogues with images. Existing work
assumes that both an image and a textual context are available, but
image-grounded dialogues by nature are more difficult to obtain than textual
dialogues. Thus, we propose learning a response generation model with both
image-grounded dialogues and textual dialogues by assuming that the visual
scene information at the time of a conversation can be represented by an image,
and trying to recover the latent images of the textual dialogues through
text-to-image generation techniques. The likelihood of the two types of
dialogues is then formulated by a response generator and an image reconstructor
that are learned within a conditional variational auto-encoding framework.
Empirical studies are conducted in both image-grounded conversation and
text-based conversation. In the first scenario, image-grounded dialogues,
especially under a low-resource setting, can be effectively augmented by
textual dialogues with latent images; while in the second scenario, latent
images can enrich the content of responses and at the same time keep them
relevant to contexts.
Related papers
- BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation [21.052101309555464]
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both.
Previous work relies on the text modality as an intermediary step for both the image input and output of the model rather than adopting an end-to-end approach.
We propose BI-MDRG that bridges the response generation path such that the image history information is utilized for enhanced relevance of text responses to the image content.
arXiv Detail & Related papers (2024-08-12T05:22:42Z) - Teaching Text-to-Image Models to Communicate in Dialog [44.76942024105259]
In this paper, we focus on the innovative dialog-to-image generation task.
To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models.
Our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones.
arXiv Detail & Related papers (2023-09-27T09:33:16Z) - Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - IMAD: IMage-Augmented multi-modal Dialogue [0.043847653914745384]
This paper presents a novel perspective on multi-modal dialogue systems, which interprets the image in the context of the dialogue.
We propose a two-stage approach to automatically construct a multi-modal dialogue dataset.
In the first stage, we utilize text-to-image similarity and sentence similarity to identify which utterances could be replaced with an image.
In the second stage, we replace those utterances by selecting a subset of relevant images and filtering them with a visual question answering model.
arXiv Detail & Related papers (2023-05-17T18:38:10Z) - A Benchmark for Understanding and Generating Dialogue between Characters
in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories.
We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition.
We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z) - Multimodal Dialogue Response Generation [27.611204319057393]
We present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response.
We consider multimodal dialogue generation under a natural assumption that only limited training examples are available.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire model.
arXiv Detail & Related papers (2021-10-16T08:52:26Z) - Learning Reasoning Paths over Semantic Graphs for Video-grounded
Dialogues [73.04906599884868]
We propose a novel framework of Reasoning Paths in Dialogue Context (PDC)
PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer.
Our model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer.
arXiv Detail & Related papers (2021-03-01T07:39:26Z) - OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
Contexts [35.57757367869986]
We release bf OpenViDial, a large-scale multi- module dialogue dataset.
OpenViDial contains a total number of 1.1 million dialogue turns.
We propose a family of encoder-decoder models leveraging both textual and visual contexts.
arXiv Detail & Related papers (2020-12-30T03:02:50Z) - Stylized Dialogue Response Generation Using Stylized Unpaired Texts [63.69880979112312]
This paper proposes a stylized dialogue generation method that can capture stylistic features embedded in unpaired texts.
Our method can produce dialogue responses that are both coherent to the given context and conform to the target style.
arXiv Detail & Related papers (2020-09-27T01:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.