Domain Adaptation for Efficiently Fine-tuning Vision Transformer with
Encrypted Images
- URL: http://arxiv.org/abs/2309.02556v2
- Date: Thu, 7 Sep 2023 01:29:40 GMT
- Title: Domain Adaptation for Efficiently Fine-tuning Vision Transformer with
Encrypted Images
- Authors: Teru Nagamori, Sayaka Shiota, Hitoshi Kiya
- Abstract summary: We propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT)
The proposed domain adaptation method does not cause the degradation accuracy of models, and it is carried out on the basis of the embedding structure of ViT.
In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
- Score: 6.476298483207895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, deep neural networks (DNNs) trained with transformed data
have been applied to various applications such as privacy-preserving learning,
access control, and adversarial defenses. However, the use of transformed data
decreases the performance of models. Accordingly, in this paper, we propose a
novel method for fine-tuning models with transformed images under the use of
the vision transformer (ViT). The proposed domain adaptation method does not
cause the accuracy degradation of models, and it is carried out on the basis of
the embedding structure of ViT. In experiments, we confirmed that the proposed
method prevents accuracy degradation even when using encrypted images with the
CIFAR-10 and CIFAR-100 datasets.
Related papers
- Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving
Vision Transformer [6.476298483207895]
We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT)
The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images.
A domain adaptation method is used to efficiently fine-tune ViT with encrypted images.
arXiv Detail & Related papers (2024-01-10T12:46:31Z) - Robustcaps: a transformation-robust capsule network for image
classification [6.445605125467574]
We present a deep neural network model that exhibits the desirable property of transformation-robustness.
Our model, termed RobustCaps, uses group-equivariant convolutions in an improved capsule network model.
It achieves state-of-the-art accuracies on CIFAR-10, FashionMNIST, and CIFAR-100 datasets.
arXiv Detail & Related papers (2022-10-20T08:42:33Z) - Image and Model Transformation with Secret Key for Vision Transformer [16.055655429920993]
We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images.
The performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key.
arXiv Detail & Related papers (2022-07-12T08:02:47Z) - Adaptive Transformers for Robust Few-shot Cross-domain Face
Anti-spoofing [71.06718651013965]
We present adaptive vision transformers (ViT) for robust cross-domain face antispoofing.
We adopt ViT as a backbone to exploit its strength to account for long-range dependencies among pixels.
Experiments on several benchmark datasets show that the proposed models achieve both robust and competitive performance.
arXiv Detail & Related papers (2022-03-23T03:37:44Z) - GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
We introduce a method, named GradViT, that optimize random noise into naturally looking images.
We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - Adaptive Image Transformations for Transfer-based Adversarial Attack [73.74904401540743]
We propose a novel architecture, called Adaptive Image Transformation Learner (AITL)
Our elaborately designed learner adaptively selects the most effective combination of image transformations specific to the input image.
Our method significantly improves the attack success rates on both normally trained models and defense models under various settings.
arXiv Detail & Related papers (2021-11-27T08:15:44Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Investigating the Vision Transformer Model for Image Retrieval Tasks [1.375062426766416]
This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior preparation.
The proposed description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters.
In image retrieval tasks, the use of global and local descriptors has been very successfully replaced, over the last years, by the Convolutional Neural Networks (CNN)-based methods.
arXiv Detail & Related papers (2021-01-11T08:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.