Image and Model Transformation with Secret Key for Vision Transformer
- URL: http://arxiv.org/abs/2207.05366v1
- Date: Tue, 12 Jul 2022 08:02:47 GMT
- Title: Image and Model Transformation with Secret Key for Vision Transformer
- Authors: Hitoshi Kiya, Ryota Iijima and MaungMaung Aprilpyone, and Yuma
Kinoshita
- Abstract summary: We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images.
The performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key.
- Score: 16.055655429920993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a combined use of transformed images and vision
transformer (ViT) models transformed with a secret key. We show for the first
time that models trained with plain images can be directly transformed to
models trained with encrypted images on the basis of the ViT architecture, and
the performance of the transformed models is the same as models trained with
plain images when using test images encrypted with the key. In addition, the
proposed scheme does not require any specially prepared data for training
models or network modification, so it also allows us to easily update the
secret key. In an experiment, the effectiveness of the proposed scheme is
evaluated in terms of performance degradation and model protection performance
in an image classification task on the CIFAR-10 dataset.
Related papers
- Domain Adaptation for Efficiently Fine-tuning Vision Transformer with
Encrypted Images [6.476298483207895]
We propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT)
The proposed domain adaptation method does not cause the degradation accuracy of models, and it is carried out on the basis of the embedding structure of ViT.
In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-09-05T19:45:27Z) - Patch Is Not All You Need [57.290256181083016]
We propose a novel Pattern Transformer to adaptively convert images to pattern sequences for Transformer input.
We employ the Convolutional Neural Network to extract various patterns from the input image.
We have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.
arXiv Detail & Related papers (2023-08-21T13:54:00Z) - An Encryption Method of ConvMixer Models without Performance Degradation [14.505867475659276]
We propose an encryption method for ConvMixer models with a secret key.
The effectiveness of the proposed method is evaluated in terms of classification accuracy and model protection.
arXiv Detail & Related papers (2022-07-25T07:09:16Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z) - Towards a Unified Foundation Model: Jointly Pre-Training Transformers on
Unpaired Images and Text [93.11954811297652]
We design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads.
We employ the separately-trained BERT and ViT models as teachers and apply knowledge distillation to provide additional, accurate supervision signals.
Experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks.
arXiv Detail & Related papers (2021-12-14T00:20:55Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.