Related papers: Image and Model Transformation with Secret Key for Vision Transformer

Image and Model Transformation with Secret Key for Vision Transformer

URL: http://arxiv.org/abs/2207.05366v1
Date: Tue, 12 Jul 2022 08:02:47 GMT
Title: Image and Model Transformation with Secret Key for Vision Transformer
Authors: Hitoshi Kiya, Ryota Iijima and MaungMaung Aprilpyone, and Yuma Kinoshita
Abstract summary: We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images. The performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key.
Score: 16.055655429920993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a combined use of transformed images and vision transformer (ViT) models transformed with a secret key. We show for the first time that models trained with plain images can be directly transformed to models trained with encrypted images on the basis of the ViT architecture, and the performance of the transformed models is the same as models trained with plain images when using test images encrypted with the key. In addition, the proposed scheme does not require any specially prepared data for training models or network modification, so it also allows us to easily update the secret key. In an experiment, the effectiveness of the proposed scheme is evaluated in terms of performance degradation and model protection performance in an image classification task on the CIFAR-10 dataset.

Related papers

Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images [6.476298483207895]
We propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT) The proposed domain adaptation method does not cause the degradation accuracy of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-09-05T19:45:27Z)
Patch Is Not All You Need [57.290256181083016]
We propose a novel Pattern Transformer to adaptively convert images to pattern sequences for Transformer input. We employ the Convolutional Neural Network to extract various patterns from the input image. We have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.
arXiv Detail & Related papers (2023-08-21T13:54:00Z)
An Encryption Method of ConvMixer Models without Performance Degradation [14.505867475659276]
We propose an encryption method for ConvMixer models with a secret key. The effectiveness of the proposed method is evaluated in terms of classification accuracy and model protection.
arXiv Detail & Related papers (2022-07-25T07:09:16Z)
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z)
Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text [93.11954811297652]
We design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads. We employ the separately-trained BERT and ViT models as teachers and apply knowledge distillation to provide additional, accurate supervision signals. Experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks.
arXiv Detail & Related papers (2021-12-14T00:20:55Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification. Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers. We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z)
Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT) We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.