TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation
Localization
- URL: http://arxiv.org/abs/2108.04508v1
- Date: Tue, 10 Aug 2021 08:22:05 GMT
- Title: TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation
Localization
- Authors: Zan Gao, Chao Sun, Zhiyong Cheng, Weili Guan, Anan Liu, Meng Wang
- Abstract summary: We propose a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) for generic image manipulation localization.
The proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
- Score: 49.521622399483846
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Finding tampered regions in images is a hot research topic in machine
learning and computer vision. Although many image manipulation location
algorithms have been proposed, most of them only focus on the RGB images with
different color spaces, and the frequency information that contains the
potential tampering clues is often ignored. In this work, a novel end-to-end
two-stream boundary-aware network (abbreviated as TBNet) is proposed for
generic image manipulation localization in which the RGB stream, the frequency
stream, and the boundary artifact location are explored in a unified framework.
Specifically, we first design an adaptive frequency selection module (AFS) to
adaptively select the appropriate frequency to mine inconsistent statistics and
eliminate the interference of redundant statistics. Then, an adaptive
cross-attention fusion module (ACF) is proposed to adaptively fuse the RGB
feature and the frequency feature. Finally, the boundary artifact location
network (BAL) is designed to locate the boundary artifacts for which the
parameters are jointly updated by the outputs of the ACF, and its results are
further fed into the decoder. Thus, the parameters of the RGB stream, the
frequency stream, and the boundary artifact location network are jointly
optimized, and their latent complementary relationships are fully mined. The
results of extensive experiments performed on four public benchmarks of the
image manipulation localization task, namely, CASIA1.0, COVER, Carvalho, and
In-The-Wild, demonstrate that the proposed TBNet can significantly outperform
state-of-the-art generic image manipulation localization methods in terms of
both MCC and F1.
Related papers
- Unveiling the Limits of Alignment: Multi-modal Dynamic Local Fusion Network and A Benchmark for Unaligned RGBT Video Object Detection [5.068440399797739]
Current RGB-Thermal Video Object Detection (RGBT VOD) methods depend on manually aligning data at the image level.
We propose a Multi-modal Dynamic Local fusion Network (MDLNet) designed to handle unaligned RGBT pairs.
We conduct a comprehensive evaluation and comparison with MDLNet and state-of-the-art (SOTA) models, demonstrating the superior effectiveness of MDLNet.
arXiv Detail & Related papers (2024-10-16T01:06:12Z) - FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder [49.79611204954311]
Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to absorption light and scattering caused by water and suspended particles.
Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images.
arXiv Detail & Related papers (2024-04-27T15:16:34Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - Efficient Frequency Domain-based Transformers for High-Quality Image
Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring.
We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Attention-Guided NIR Image Colorization via Adaptive Fusion of Semantic
and Texture Clues [6.437931036166344]
Near infrared (NIR) imaging has been widely applied in low-light imaging scenarios.
It is difficult for human and algorithms to perceive the real scene in the colorless NIR domain.
We propose a novel Attention-based NIR image colorization framework via Adaptive Fusion of Semantic and Texture clues.
arXiv Detail & Related papers (2021-07-20T03:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.