SPAN: Spatial Pyramid Attention Network forImage Manipulation
Localization
- URL: http://arxiv.org/abs/2009.00726v2
- Date: Thu, 14 Jan 2021 01:43:21 GMT
- Title: SPAN: Spatial Pyramid Attention Network forImage Manipulation
Localization
- Authors: Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng
Yang, Ram Nevatia
- Abstract summary: We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.
SPAN is trained on a generic, synthetic dataset but can also be fine tuned for specific datasets.
The proposed method shows significant gains in performance on standard datasets over previous state-of-the-art methods.
- Score: 24.78951727072683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel framework, Spatial Pyramid Attention Network (SPAN) for
detection and localization of multiple types of image manipulations. The
proposed architecture efficiently and effectively models the relationship
between image patches at multiple scales by constructing a pyramid of local
self-attention blocks. The design includes a novel position projection to
encode the spatial positions of the patches. SPAN is trained on a generic,
synthetic dataset but can also be fine tuned for specific datasets; The
proposed method shows significant gains in performance on standard datasets
over previous state-of-the-art methods.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder [26.81539884309151]
Remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information.
Despite various pre-training methods tailored to the characteristics of RS data, a key limitation persists: the inability to effectively integrate spatial, temporal, and spectral information within a single unified model.
We propose an Anchor-Aware Masked AutoEncoder method (A$2$-MAE), leveraging intrinsic complementary information from the different kinds of images and geo-information to reconstruct the masked patches during the pre-training phase.
arXiv Detail & Related papers (2024-06-12T11:02:15Z) - SCALAR-NeRF: SCAlable LARge-scale Neural Radiance Fields for Scene
Reconstruction [66.69049158826677]
We introduce SCALAR-NeRF, a novel framework tailored for scalable large-scale neural scene reconstruction.
We structure the neural representation as an encoder-decoder architecture, where the encoder processes 3D point coordinates to produce encoded features.
We propose an effective and efficient methodology to fuse the outputs from these local models to attain the final reconstruction.
arXiv Detail & Related papers (2023-11-28T10:18:16Z) - Lookup Table meets Local Laplacian Filter: Pyramid Reconstruction
Network for Tone Mapping [35.47139372780014]
This paper explores a novel strategy that integrates global and local operators by utilizing closed-form Laplacian pyramid decomposition and reconstruction.
We employ image-adaptive 3D LUTs to manipulate the tone in the low-frequency image by leveraging the specific characteristics of the frequency information.
We also utilize local Laplacian filters to refine the edge details in the high-frequency components in an adaptive manner.
arXiv Detail & Related papers (2023-10-26T07:05:38Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Reconciliation of Statistical and Spatial Sparsity For Robust Image and
Image-Set Classification [27.319334479994787]
We propose a novel Joint Statistical and Spatial Sparse representation, dubbed textitJ3S, to model the image or image-set data for classification.
We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity.
Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
arXiv Detail & Related papers (2021-06-01T06:33:24Z) - Dual Attention GANs for Semantic Image Synthesis [101.36015877815537]
We propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images.
We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM)
DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.
arXiv Detail & Related papers (2020-08-29T17:49:01Z) - Image Stitching Based on Planar Region Consensus [22.303750435673752]
We propose a new image stitching method which stitches images by allowing for the alignment of a set of matched dominant planar regions.
We use rich semantic information directly from RGB images to extract planar image regions with a deep Convolutional Neural Network (CNN)
Our method can deal with different situations and outperforms the state-of-the-arts on challenging scenes.
arXiv Detail & Related papers (2020-07-06T13:07:20Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.