TriPINet: Tripartite Progressive Integration Network for Image
Manipulation Localization
- URL: http://arxiv.org/abs/2212.12841v1
- Date: Sun, 25 Dec 2022 02:27:58 GMT
- Title: TriPINet: Tripartite Progressive Integration Network for Image
Manipulation Localization
- Authors: Wei-Yun Liang, Jing Xu, and Xiao Jin
- Abstract summary: We propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization.
We develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues.
Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches.
- Score: 3.7359400978194675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image manipulation localization aims at distinguishing forged regions from
the whole test image. Although many outstanding prior arts have been proposed
for this task, there are still two issues that need to be further studied: 1)
how to fuse diverse types of features with forgery clues; 2) how to
progressively integrate multistage features for better localization
performance. In this paper, we propose a tripartite progressive integration
network (TriPINet) for end-to-end image manipulation localization. First, we
extract both visual perception information, e.g., RGB input images, and visual
imperceptible features, e.g., frequency and noise traces for forensic feature
learning. Second, we develop a guided cross-modality dual-attention (gCMDA)
module to fuse different types of forged clues. Third, we design a set of
progressive integration squeeze-and-excitation (PI-SE) modules to improve
localization performance by appropriately incorporating multiscale features in
the decoder. Extensive experiments are conducted to compare our method with
state-of-the-art image forensics approaches. The proposed TriPINet obtains
competitive results on several benchmark datasets.
Related papers
- DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - Collaborative Score Distillation for Consistent Visual Synthesis [70.29294250371312]
Collaborative Score Distillation (CSD) is based on the Stein Variational Gradient Descent (SVGD)
We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes.
Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.
arXiv Detail & Related papers (2023-07-04T17:31:50Z) - Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image
Retrieval [55.21569389894215]
We propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discarding them.
Our framework first maps paired datapoints from the individual photo and sketch modalities to fused representations that unify information from both modalities.
We then decouple the input space of the aforementioned modality fusion network into independent encoders of the individual modalities via contrastive and relational cross-modal knowledge distillation.
arXiv Detail & Related papers (2022-10-19T11:50:14Z) - Exploring the Interactive Guidance for Unified and Effective Image
Matting [16.933897631478146]
We propose a Unified Interactive image Matting method, named UIM, which solves the limitations and achieves satisfying matting results.
Specifically, UIM leverages multiple types of user interaction to avoid the ambiguity of multiple matting targets.
We show that UIM achieves state-of-the-art performance on the Composition-1K test set and a synthetic unified dataset.
arXiv Detail & Related papers (2022-05-17T13:20:30Z) - Exploring Separable Attention for Multi-Contrast MR Image
Super-Resolution [88.16655157395785]
We propose a separable attention network (comprising a priority attention and background separation attention) named SANet.
It can explore the foreground and background areas in the forward and reverse directions with the help of the auxiliary contrast.
It is the first model to explore a separable attention mechanism that uses the auxiliary contrast to predict the foreground and background regions.
arXiv Detail & Related papers (2021-09-03T05:53:07Z) - Operation-wise Attention Network for Tampering Localization Fusion [15.633461635276337]
In this work, we present a deep learning-based approach for image tampering localization fusion.
This approach is designed to combine the outcomes of multiple image forensics algorithms and provides a fused tampering localization map.
Our fusion framework includes a set of five individual tampering localization methods for splicing localization on JPEG images.
arXiv Detail & Related papers (2021-05-12T08:50:59Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.