Domain Adaptation for Efficiently Fine-tuning Vision Transformer with
Encrypted Images
- URL: http://arxiv.org/abs/2309.02556v2
- Date: Thu, 7 Sep 2023 01:29:40 GMT
- Title: Domain Adaptation for Efficiently Fine-tuning Vision Transformer with
Encrypted Images
- Authors: Teru Nagamori, Sayaka Shiota, Hitoshi Kiya
- Abstract summary: We propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT)
The proposed domain adaptation method does not cause the degradation accuracy of models, and it is carried out on the basis of the embedding structure of ViT.
In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
- Score: 6.476298483207895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, deep neural networks (DNNs) trained with transformed data
have been applied to various applications such as privacy-preserving learning,
access control, and adversarial defenses. However, the use of transformed data
decreases the performance of models. Accordingly, in this paper, we propose a
novel method for fine-tuning models with transformed images under the use of
the vision transformer (ViT). The proposed domain adaptation method does not
cause the accuracy degradation of models, and it is carried out on the basis of
the embedding structure of ViT. In experiments, we confirmed that the proposed
method prevents accuracy degradation even when using encrypted images with the
CIFAR-10 and CIFAR-100 datasets.
Related papers
- Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry [1.2289361708127877]
We propose a causal visual-inertial fusion transformer (VIFT) for pose estimation in deep visual-inertial odometry.
The proposed method is end-to-end trainable and requires only a monocular camera and IMU during inference.
arXiv Detail & Related papers (2024-09-13T12:21:25Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving
Vision Transformer [6.476298483207895]
We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT)
The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images.
A domain adaptation method is used to efficiently fine-tune ViT with encrypted images.
arXiv Detail & Related papers (2024-01-10T12:46:31Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Robustcaps: a transformation-robust capsule network for image
classification [6.445605125467574]
We present a deep neural network model that exhibits the desirable property of transformation-robustness.
Our model, termed RobustCaps, uses group-equivariant convolutions in an improved capsule network model.
It achieves state-of-the-art accuracies on CIFAR-10, FashionMNIST, and CIFAR-100 datasets.
arXiv Detail & Related papers (2022-10-20T08:42:33Z) - Image and Model Transformation with Secret Key for Vision Transformer [16.055655429920993]
We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images.
The performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key.
arXiv Detail & Related papers (2022-07-12T08:02:47Z) - Adaptive Transformers for Robust Few-shot Cross-domain Face
Anti-spoofing [71.06718651013965]
We present adaptive vision transformers (ViT) for robust cross-domain face antispoofing.
We adopt ViT as a backbone to exploit its strength to account for long-range dependencies among pixels.
Experiments on several benchmark datasets show that the proposed models achieve both robust and competitive performance.
arXiv Detail & Related papers (2022-03-23T03:37:44Z) - GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
We introduce a method, named GradViT, that optimize random noise into naturally looking images.
We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Investigating the Vision Transformer Model for Image Retrieval Tasks [1.375062426766416]
This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior preparation.
The proposed description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters.
In image retrieval tasks, the use of global and local descriptors has been very successfully replaced, over the last years, by the Convolutional Neural Networks (CNN)-based methods.
arXiv Detail & Related papers (2021-01-11T08:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.