AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection
- URL: http://arxiv.org/abs/2410.15591v1
- Date: Mon, 21 Oct 2024 02:19:24 GMT
- Title: AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection
- Authors: Xiaoman Xu, Xiangrun Li, Taihang Wang, Ye Jiang,
- Abstract summary: This paper introduces Emotion-textbfAware textbfMultimodal Fusion textbfPrompt textbfLtextbfEarning (textbfAMPLE) framework to address the above issue.
This framework extracts emotional elements from texts by leveraging sentiment analysis tools.
It then employs Multi-Head Cross-Attention (MCA) mechanisms and similarity-aware fusion methods to integrate multimodal data.
- Score: 0.1499944454332829
- License:
- Abstract: Detecting fake news in large datasets is challenging due to its diversity and complexity, with traditional approaches often focusing on textual features while underutilizing semantic and emotional elements. Current methods also rely heavily on large annotated datasets, limiting their effectiveness in more nuanced analysis. To address these challenges, this paper introduces Emotion-\textbf{A}ware \textbf{M}ultimodal Fusion \textbf{P}rompt \textbf{L}\textbf{E}arning (\textbf{AMPLE}) framework to address the above issue by combining text sentiment analysis with multimodal data and hybrid prompt templates. This framework extracts emotional elements from texts by leveraging sentiment analysis tools. It then employs Multi-Head Cross-Attention (MCA) mechanisms and similarity-aware fusion methods to integrate multimodal data. The proposed AMPLE framework demonstrates strong performance on two public datasets in both few-shot and data-rich settings, with results indicating the potential of emotional aspects in fake news detection. Furthermore, the study explores the impact of integrating large language models with this method for text sentiment extraction, revealing substantial room for further improvement. The code can be found at :\url{https://github.com/xxm1215/MMM2025_few-shot/
Related papers
- Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild [45.29814349246784]
multimodal large language models (LLMs) rely on explicit non-verbal cues that may be translated from different non-textual modalities into text.
This paper compares the potential of text- and feature-based approaches for compound multimodal ER in videos.
arXiv Detail & Related papers (2024-07-17T18:01:25Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box
Machine-Generated Text Detection [69.29017069438228]
Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries.
This has also raised concerns about the potential misuse of such texts in journalism, education, and academia.
In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse.
arXiv Detail & Related papers (2023-05-24T08:55:11Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.