SiamixFormer: a fully-transformer Siamese network with temporal Fusion
for accurate building detection and change detection in bi-temporal remote
sensing images
- URL: http://arxiv.org/abs/2208.00657v2
- Date: Fri, 21 Jul 2023 08:39:22 GMT
- Title: SiamixFormer: a fully-transformer Siamese network with temporal Fusion
for accurate building detection and change detection in bi-temporal remote
sensing images
- Authors: Amir Mohammadian, Foad Ghaderi
- Abstract summary: Building detection and change detection using remote sensing images can help urban and rescue planning.
Currently, most of the existing models for building detection use only one image (pre-disaster image) to detect buildings.
In this paper, we propose a siamese model, called SiamixFormer, which uses pre- and post-disaster images as input.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building detection and change detection using remote sensing images can help
urban and rescue planning. Moreover, they can be used for building damage
assessment after natural disasters. Currently, most of the existing models for
building detection use only one image (pre-disaster image) to detect buildings.
This is based on the idea that post-disaster images reduce the model's
performance because of presence of destroyed buildings. In this paper, we
propose a siamese model, called SiamixFormer, which uses pre- and post-disaster
images as input. Our model has two encoders and has a hierarchical transformer
architecture. The output of each stage in both encoders is given to a temporal
transformer for feature fusion in a way that query is generated from
pre-disaster images and (key, value) is generated from post-disaster images. To
this end, temporal features are also considered in feature fusion. Another
advantage of using temporal transformers in feature fusion is that they can
better maintain large receptive fields generated by transformer encoders
compared with CNNs. Finally, the output of the temporal transformer is given to
a simple MLP decoder at each stage. The SiamixFormer model is evaluated on xBD,
and WHU datasets, for building detection and on LEVIR-CD and CDD datasets for
change detection and could outperform the state-of-the-art.
Related papers
- Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images.
Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images.
ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Pure Transformer with Integrated Experts for Scene Text Recognition [11.089203218000854]
Scene text recognition (STR) involves the task of reading text in cropped images of natural scenes.
Recent times, the transformer architecture is being widely adopted in STR as it shows strong capability in capturing long-term dependency.
This work proposes the use of a transformer-only model as a simple baseline which outperforms hybrid CNN-transformer models.
arXiv Detail & Related papers (2022-11-09T15:26:59Z) - Masked Transformer for image Anomaly Localization [14.455765147827345]
We propose a new model for image anomaly detection based on Vision Transformer architecture with patch masking.
We show that multi-resolution patches and their collective embeddings provide a large improvement in the model's performance.
The proposed model has been tested on popular anomaly detection datasets such as MVTec and head CT.
arXiv Detail & Related papers (2022-10-27T15:30:48Z) - Dual-Tasks Siamese Transformer Framework for Building Damage Assessment [11.888964682446879]
We present the first attempt at designing a Transformer-based damage assessment architecture (DamFormer)
To the best of our knowledge, it is the first time that such a deep Transformer-based network is proposed for multitemporal remote sensing interpretation tasks.
arXiv Detail & Related papers (2022-01-26T14:11:16Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - REPLICA: Enhanced Feature Pyramid Network by Local Image Translation and
Conjunct Attention for High-Resolution Breast Tumor Detection [6.112883009328882]
We call our method enhanced featuREsynthesis network by Local Image translation and Conjunct Attention, or REPLICA.
We use a convolutional autoencoder as a generator to create new images by injecting objects into images via local Pyramid and reconstruction of their features extracted in hidden layers.
Then due to the larger number of simulated images, we use a visual transformer to enhance outputs of each ResNet layer that serve as inputs to a feature pyramid network.
arXiv Detail & Related papers (2021-11-22T21:33:02Z) - PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion [37.993611194758195]
We propose a Patch PyramidTransformer(PPT) to address the issues of extracting semantic information from an image.
The experimental results demonstrate its superior performance against the state-of-the-art fusion approaches.
arXiv Detail & Related papers (2021-07-29T13:57:45Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.