Related papers: Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

URL: http://arxiv.org/abs/2109.01839v1
Date: Sat, 4 Sep 2021 10:39:52 GMT
Title: Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark
Authors: Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie Zhou
Abstract summary: We propose a new task named as textbfMeme incorporated textbfOpen-domain textbfDialogue (MOD) MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them. We construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances.
Score: 28.255324166852535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a kind of new expression elements, Internet memes are popular and extensively used in online chatting scenarios since they manage to make dialogues vivid, moving, and interesting. However, most current dialogue researches focus on text-only dialogue tasks. In this paper, we propose a new task named as \textbf{M}eme incorporated \textbf{O}pen-domain \textbf{D}ialogue (MOD). Compared to previous dialogue tasks, MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them. To facilitate the MOD research, we construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances. The dataset consists of $\sim$45K Chinese conversations with $\sim$606K utterances. Each conversation contains about $13$ utterances with about $4$ Internet memes on average and each utterance equipped with an Internet meme is annotated with the corresponding emotion. In addition, we present a simple and effective method, which utilizes a unified generation network to solve the MOD task. Experimental results demonstrate that our method trained on the proposed corpus is able to achieve expressive communication including texts and memes. The corpus and models have been publicly available at https://github.com/lizekang/DSTC10-MOD.

Related papers

DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling [73.08187964426823]
Large language models (LLMs) enabled dialogue systems have become one of the central modes in human-machine interaction. This paper introduces a new research task--$textbfD$ialogue $textbfE$lement $textbfMO$deling. We propose a novel benchmark, $textbfDEMO$, designed for a comprehensive dialogue modeling and assessment.
arXiv Detail & Related papers (2024-12-06T10:01:38Z)
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response. We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z)
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z)
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models [60.81438804824749]
Multimodal instruction-following models extend capabilities by integrating both text and images. Existing models such as MiniGPT-4 and LLaVA face challenges in maintaining dialogue coherence in scenarios involving multiple images. We introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. We then present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images.
arXiv Detail & Related papers (2023-08-31T05:15:27Z)
ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human [76.62897301298699]
ChatPLUG is a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. We show that modelname outperforms state-of-the-art Chinese dialogue systems on both automatic and human evaluation. We deploy modelname to real-world applications such as Smart Speaker and Instant Message applications with fast inference.
arXiv Detail & Related papers (2023-04-16T18:16:35Z)
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos [75.37313546008639]
We introduce CHAMPAGNE, a generative model of conversations that can account for visual contexts. To train CHAMPAGNE, we collect and release-18M, a large-scale corpus of 18M video-based dialogues.
arXiv Detail & Related papers (2023-03-17T01:10:33Z)
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World [97.58623810402563]
We introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context.
arXiv Detail & Related papers (2023-01-14T10:18:22Z)
Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning. Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement. Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z)
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z)
Towards Building an Open-Domain Dialogue System Incorporated with Internet Memes [19.57042922215698]
This paper presents our solutions for the Meme incorporated Open-domain Dialogue (MOD) Challenge of DSTC10. We leverage a large-scale pre-trained dialogue model for coherent and informative response generation. Based on interaction-based text-matching, our approach can retrieve appropriate memes with good generalization ability.
arXiv Detail & Related papers (2022-03-08T03:54:02Z)
Fusing task-oriented and open-domain dialogues in conversational agents [12.338220374261343]
Two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant. Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues. It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other.
arXiv Detail & Related papers (2021-09-09T09:48:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.