MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task
Image Manipulation Detection and Localization
- URL: http://arxiv.org/abs/2211.03140v1
- Date: Sun, 6 Nov 2022 14:58:21 GMT
- Title: MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task
Image Manipulation Detection and Localization
- Authors: Fengsheng Wang, Leyi Wei
- Abstract summary: A novel multi-scale multi-grained deep network (MSMG-Net) is proposed to automatically identify manipulated regions.
In our MSMG-Net, a parallel multi-scale feature extraction structure is used to extract multi-scale features.
The MSMG-Net can effectively perceive the object-level semantics and encode the edge artifact.
- Score: 1.14219428942199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid advances of image editing techniques in recent years, image
manipulation detection has attracted considerable attention since the
increasing security risks posed by tampered images. To address these
challenges, a novel multi-scale multi-grained deep network (MSMG-Net) is
proposed to automatically identify manipulated regions. In our MSMG-Net, a
parallel multi-scale feature extraction structure is used to extract
multi-scale features. Then the multi-grained feature learning is utilized to
perceive object-level semantics relation of multi-scale features by introducing
the shunted self-attention. To fuse multi-scale multi-grained features, global
and local feature fusion block are designed for manipulated region segmentation
by a bottom-up approach and multi-level feature aggregation block is designed
for edge artifacts detection by a top-down approach. Thus, MSMG-Net can
effectively perceive the object-level semantics and encode the edge artifact.
Experimental results on five benchmark datasets justify the superior
performance of the proposed method, outperforming state-of-the-art manipulation
detection and localization methods. Extensive ablation experiments and feature
visualization demonstrate the multi-scale multi-grained learning can present
effective visual representations of manipulated regions. In addition, MSMG-Net
shows better robustness when various post-processing methods further manipulate
images.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization.
Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations.
Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Image Manipulation Detection by Multi-View Multi-Scale Supervision [11.319080833880307]
Key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data.
In this paper we address both aspects by multi-view feature learning and multi-scale supervision.
Our thoughts are realized by a new network which we term MVSS-Net.
arXiv Detail & Related papers (2021-04-14T13:05:58Z) - MGML: Multi-Granularity Multi-Level Feature Ensemble Network for Remote
Sensing Scene Classification [15.856162817494726]
We propose a Multi-granularity Multi-Level Feature Ensemble Network (MGML-FENet) to efficiently tackle RS scene classification task.
We show that our proposed networks achieve better performance than previous state-of-the-art (SOTA) networks.
arXiv Detail & Related papers (2020-12-29T02:18:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.