Related papers: Harmfully Manipulated Images Matter in Multimodal Misinformation Detection

Harmfully Manipulated Images Matter in Multimodal Misinformation Detection

URL: http://arxiv.org/abs/2407.19192v1
Date: Sat, 27 Jul 2024 07:16:07 GMT
Title: Harmfully Manipulated Images Matter in Multimodal Misinformation Detection
Authors: Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li,
Abstract summary: Multimodal Misinformation Detection (MMD) has attracted growing attention from the academic and industrial communities. We propose a novel HAMI-M3D method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D) Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.
Score: 22.236455110413264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection (MMD). Typically, existing MMD methods capture the semantic correlation and inconsistency between multiple modalities, but neglect some potential clues in multimodal content. Recent studies suggest that manipulated traces of the images in articles are non-trivial clues for detecting misinformation. Meanwhile, we find that the underlying intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Accordingly, in this work, we propose to detect misinformation by learning manipulation features that indicate whether the image has been manipulated, as well as intention features regarding the harmful and harmless intentions of the manipulation. Unfortunately, the manipulation and intention labels that make these features discriminative are unknown. To overcome the problem, we propose two weakly supervised signals as alternatives by introducing additional datasets on image manipulation detection and formulating two classification tasks as positive and unlabeled learning problems. Based on these ideas, we propose a novel MMD method, namely Harmfully Manipulated Images Matter in MMD (HAMI-M3D). Extensive experiments across three benchmark datasets can demonstrate that HAMI-M3D can consistently improve the performance of any MMD baselines.

Related papers

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection [107.86009509291581]
We propose ForgerySleuth to perform comprehensive clue fusion and generate segmentation outputs indicating regions that are tampered with. Our experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in robustness, generalization, and explainability.
arXiv Detail & Related papers (2024-11-29T04:35:18Z)
Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing [2.0528748158119434]
multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy. In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data. To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing.
arXiv Detail & Related papers (2024-09-13T14:50:50Z)
Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency. Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4) DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content. We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z)
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations [10.20962191915879]
M3Dsynth is a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans. Experiments show that these images easily fool automated diagnostic tools.
arXiv Detail & Related papers (2023-09-14T18:16:58Z)
Detecting and Grounding Multi-Modal Media Manipulation [32.34908534582532]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4) DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content. We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-04-05T16:20:40Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
Absolute Wrong Makes Better: Boosting Weakly Supervised Object Detection via Negative Deterministic Information [54.35679298764169]
Weakly supervised object detection (WSOD) is a challenging task, in which image-level labels are used to train an object detector. This paper focuses on identifying and fully exploiting the deterministic information in WSOD. We propose a negative deterministic information (NDI) based method for improving WSOD, namely NDI-WSOD.
arXiv Detail & Related papers (2022-04-21T12:55:27Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications. We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.