Towards Expressive Communication with Internet Memes: A New Multimodal
Conversation Dataset and Benchmark
- URL: http://arxiv.org/abs/2109.01839v1
- Date: Sat, 4 Sep 2021 10:39:52 GMT
- Title: Towards Expressive Communication with Internet Memes: A New Multimodal
Conversation Dataset and Benchmark
- Authors: Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie Zhou
- Abstract summary: We propose a new task named as textbfMeme incorporated textbfOpen-domain textbfDialogue (MOD)
MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them.
We construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances.
- Score: 28.255324166852535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a kind of new expression elements, Internet memes are popular and
extensively used in online chatting scenarios since they manage to make
dialogues vivid, moving, and interesting. However, most current dialogue
researches focus on text-only dialogue tasks. In this paper, we propose a new
task named as \textbf{M}eme incorporated \textbf{O}pen-domain \textbf{D}ialogue
(MOD). Compared to previous dialogue tasks, MOD is much more challenging since
it requires the model to understand the multimodal elements as well as the
emotions behind them. To facilitate the MOD research, we construct a
large-scale open-domain multimodal dialogue dataset incorporating abundant
Internet memes into utterances. The dataset consists of $\sim$45K Chinese
conversations with $\sim$606K utterances. Each conversation contains about $13$
utterances with about $4$ Internet memes on average and each utterance equipped
with an Internet meme is annotated with the corresponding emotion. In addition,
we present a simple and effective method, which utilizes a unified generation
network to solve the MOD task. Experimental results demonstrate that our method
trained on the proposed corpus is able to achieve expressive communication
including texts and memes. The corpus and models have been publicly available
at https://github.com/lizekang/DSTC10-MOD.
Related papers
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using
Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities.
Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities.
Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z) - Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented
Instruction Tuning for Digital Human [76.62897301298699]
ChatPLUG is a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.
We show that modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation.
We deploy modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference.
arXiv Detail & Related papers (2023-04-16T18:16:35Z) - CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos [75.37313546008639]
We introduce CHAMPAGNE, a generative model of conversations that can account for visual contexts.
To train CHAMPAGNE, we collect and release-18M, a large-scale corpus of 18M video-based dialogues.
arXiv Detail & Related papers (2023-03-17T01:10:33Z) - TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
World [97.58623810402563]
We introduce a new video-based multi-modal dialogue dataset, called TikTalk.
We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them.
Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context.
arXiv Detail & Related papers (2023-01-14T10:18:22Z) - Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning.
Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement.
Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - Towards Building an Open-Domain Dialogue System Incorporated with
Internet Memes [19.57042922215698]
This paper presents our solutions for the Meme incorporated Open-domain Dialogue (MOD) Challenge of DSTC10.
We leverage a large-scale pre-trained dialogue model for coherent and informative response generation.
Based on interaction-based text-matching, our approach can retrieve appropriate memes with good generalization ability.
arXiv Detail & Related papers (2022-03-08T03:54:02Z) - Fusing task-oriented and open-domain dialogues in conversational agents [12.338220374261343]
Two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant.
Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues.
It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other.
arXiv Detail & Related papers (2021-09-09T09:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.