MemeCMD: An Automatically Generated Chinese Multi-turn Dialogue Dataset with Contextually Retrieved Memes
- URL: http://arxiv.org/abs/2507.00891v1
- Date: Tue, 01 Jul 2025 15:57:14 GMT
- Title: MemeCMD: An Automatically Generated Chinese Multi-turn Dialogue Dataset with Contextually Retrieved Memes
- Authors: Yuheng Wang, Xianhe Tang, Pufeng Huang,
- Abstract summary: We introduce MemeCMD, an automatically generated Chinese Multi-turn Dialogue dataset with contextually retrieved memes.<n>Our dataset combines a large-scale, MLLM-annotated meme library with dialogues auto-generated by dual agents across diverse scenarios.
- Score: 0.9121437356699357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Memes are widely used in online social interactions, providing vivid, intuitive, and often humorous means to express intentions and emotions. Existing dialogue datasets are predominantly limited to either manually annotated or pure-text conversations, lacking the expressiveness and contextual nuance that multimodal interactions provide.To address these challenges, we introduce MemeCMD, an automatically generated Chinese Multi-turn Dialogue dataset with contextually retrieved memes. Our dataset combines a large-scale, MLLM-annotated meme library with dialogues auto-generated by dual agents across diverse scenarios. We introduce a retrieval framework and adaptive threshold to ensure contextually relevant, naturally spaced meme usage. Experiments demonstrate the effectiveness of our approach in generating contextually appropriate and diverse meme-incorporated dialogues, offering a scalable and privacy-preserving resource for advancing multimodal conversational AI.
Related papers
- DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue [17.397151329196955]
We propose DialogueAgents, a novel hybrid agent-based speech synthesis framework.<n>We contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset.
arXiv Detail & Related papers (2025-04-20T04:14:30Z) - Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets [29.737965533532577]
Multimodal Augmented Generative Images Dialogues (MAGID) is a framework to augment text-only dialogues with diverse and high-quality images.
Our results show that MAGID is comparable to or better than baselines, with significant improvements in human evaluation.
arXiv Detail & Related papers (2024-03-05T18:31:28Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
Contextualization [31.209594252045566]
We propose a novel task, MEMEX, given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme.
To benchmark MCC, we propose MIME, a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context.
arXiv Detail & Related papers (2023-05-25T10:19:35Z) - A Mixture-of-Expert Approach to RL-based Dialogue Management [56.08449336469477]
We use reinforcement learning to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction.
Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with aly complex action space even for a medium-size vocabulary.
We develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of specialized LMs (or experts) capable of generating utterances corresponding to a
arXiv Detail & Related papers (2022-05-31T19:00:41Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - Towards Building an Open-Domain Dialogue System Incorporated with
Internet Memes [19.57042922215698]
This paper presents our solutions for the Meme incorporated Open-domain Dialogue (MOD) Challenge of DSTC10.
We leverage a large-scale pre-trained dialogue model for coherent and informative response generation.
Based on interaction-based text-matching, our approach can retrieve appropriate memes with good generalization ability.
arXiv Detail & Related papers (2022-03-08T03:54:02Z) - MSCTD: A Multimodal Sentiment Chat Translation Dataset [66.81525961469494]
We introduce a new task named Multimodal Chat Translation (MCT)
MCT aims to generate more accurate translations with the help of the associated dialogue history and visual context.
Our work can facilitate research on both multimodal chat translation and multimodal dialogue sentiment analysis.
arXiv Detail & Related papers (2022-02-28T09:40:46Z) - Towards Expressive Communication with Internet Memes: A New Multimodal
Conversation Dataset and Benchmark [28.255324166852535]
We propose a new task named as textbfMeme incorporated textbfOpen-domain textbfDialogue (MOD)
MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them.
We construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances.
arXiv Detail & Related papers (2021-09-04T10:39:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.