Adapting Vision Transformer for Efficient Change Detection
- URL: http://arxiv.org/abs/2312.04869v1
- Date: Fri, 8 Dec 2023 07:09:03 GMT
- Title: Adapting Vision Transformer for Efficient Change Detection
- Authors: Yang Zhao, Yuxiang Zhang, Yanni Dong, Bo Du
- Abstract summary: We propose an efficient tuning approach that involves freezing the parameters of the pretrained image encoder and introducing additional training parameters.
We have achieved competitive or even better results while maintaining extremely low resource consumption across six change detection benchmarks.
- Score: 36.86012953467539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most change detection models based on vision transformers currently follow a
"pretraining then fine-tuning" strategy. This involves initializing the model
weights using large scale classification datasets, which can be either natural
images or remote sensing images. However, fully tuning such a model requires
significant time and resources. In this paper, we propose an efficient tuning
approach that involves freezing the parameters of the pretrained image encoder
and introducing additional training parameters. Through this approach, we have
achieved competitive or even better results while maintaining extremely low
resource consumption across six change detection benchmarks. For example,
training time on LEVIR-CD, a change detection benchmark, is only half an hour
with 9 GB memory usage, which could be very convenient for most researchers.
Additionally, the decoupled tuning framework can be extended to any pretrained
model for semantic change detection and multi temporal change detection as
well. We hope that our proposed approach will serve as a part of foundational
model to inspire more unified training approaches on change detection in the
future.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Single-temporal Supervised Remote Change Detection for Domain Generalization [42.55492600157288]
Change detection is widely applied in remote sensing image analysis.
Existing methods require training models separately for each dataset.
We propose a multimodal contrastive learning (ChangeCLIP) based on visual-labelled pre-training for change detection domain generalization.
arXiv Detail & Related papers (2024-04-17T12:38:58Z) - Augmenting Deep Learning Adaptation for Wearable Sensor Data through
Combined Temporal-Frequency Image Encoding [4.458210211781739]
We present a novel modified-recurrent plot-based image representation that seamlessly integrates both temporal and frequency domain information.
We evaluate the proposed method using accelerometer-based activity recognition data and a pretrained ResNet model, and demonstrate its superior performance compared to existing approaches.
arXiv Detail & Related papers (2023-07-03T09:29:27Z) - Simple Open-Vocabulary Object Detection with Vision Transformers [51.57562920090721]
We propose a strong recipe for transferring image-text models to open-vocabulary object detection.
We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning.
We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection.
arXiv Detail & Related papers (2022-05-12T17:20:36Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.