Related papers: Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

URL: http://arxiv.org/abs/2409.00597v1
Date: Sun, 1 Sep 2024 03:16:30 GMT
Title: Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model
Authors: Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang,
Abstract summary: We introduce a new multimodal multi-turn conversational stance detection dataset (called MmMtCSD) We propose a novel multimodal large language model stance detection framework (MLLM-SD), that learns joint stance representations from textual and visual modalities. Experiments on MmMtCSD show state-of-the-art performance of our proposed MLLM-SD approach for multimodal stance detection.
Score: 9.413870182630362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pairs, overlooking the multi-party conversational contexts that naturally occur on social media. This limitation stems from a lack of datasets that authentically capture such conversational scenarios, hindering progress in conversational MSD. To address this, we introduce a new multimodal multi-turn conversational stance detection dataset (called MmMtCSD). To derive stances from this challenging dataset, we propose a novel multimodal large language model stance detection framework (MLLM-SD), that learns joint stance representations from textual and visual modalities. Experiments on MmMtCSD show state-of-the-art performance of our proposed MLLM-SD approach for multimodal stance detection. We believe that MmMtCSD will contribute to advancing real-world applications of stance detection research.

Related papers

Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task [14.63475566746729]
We study the new problem of stance-driven controllable content generation for tweets with text and images. We create the Multimodal Stance Generation dataset (StanceGen2024), the first resource explicitly designed for multimodal stance-controllable text generation in political discourse. We propose a Stance-Driven Multimodal Generation framework that integrates weighted fusion of multimodal features and stance guidance to improve semantic consistency and stance control.
arXiv Detail & Related papers (2025-04-04T09:20:19Z)
Multi-Granular Multimodal Clue Fusion for Meme Understanding [30.697862544992386]
multimodal meme understanding (MMU) task has been garnering increasing attention. MMU aims to explore and comprehend the meanings of memes by performing tasks such as metaphor recognition, sentiment analysis, intention detection, and offensiveness detection. We propose a multi-granular multimodal clue fusion model (MGMCF) to advance MMU.
arXiv Detail & Related papers (2025-03-16T16:16:53Z)
A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing. Recent years have seen a surge of research interest in developing effective stance detection methods. This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z)
Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing [2.0528748158119434]
multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy. In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data. To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing.
arXiv Detail & Related papers (2024-09-13T14:50:50Z)
Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency. Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features. Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z)
A Challenge Dataset and Effective Models for Conversational Stance Detection [26.208989232347058]
We introduce a new multi-turn conversation stance detection dataset (called textbfMT-CSD) We propose a global-local attention network (textbfGLAN) to address both long and short-range dependencies inherent in conversational data. Our dataset serves as a valuable resource to catalyze advancements in cross-domain stance detection.
arXiv Detail & Related papers (2024-03-17T08:51:01Z)
Multi-modal Stance Detection: New Datasets and Model [56.97470987479277]
We study multi-modal stance detection for tweets consisting of texts and images. We propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT) TMPT achieves state-of-the-art performance in multi-modal stance detection.
arXiv Detail & Related papers (2024-02-22T05:24:19Z)
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation. We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z)
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector. The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z)
Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection. Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering. We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.