Pre-Trained Image Processing Transformer
- URL: http://arxiv.org/abs/2012.00364v2
- Date: Thu, 3 Dec 2020 05:02:15 GMT
- Title: Pre-Trained Image Processing Transformer
- Authors: Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua
Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao
- Abstract summary: We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
- Score: 95.93031793337613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the computing power of modern hardware is increasing strongly, pre-trained
deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have
shown their effectiveness over conventional methods. The big progress is mainly
contributed to the representation ability of transformer and its variant
architectures. In this paper, we study the low-level computer vision task
(e.g., denoising, super-resolution and deraining) and develop a new pre-trained
model, namely, image processing transformer (IPT). To maximally excavate the
capability of transformer, we present to utilize the well-known ImageNet
benchmark for generating a large amount of corrupted image pairs. The IPT model
is trained on these images with multi-heads and multi-tails. In addition, the
contrastive learning is introduced for well adapting to different image
processing tasks. The pre-trained model can therefore efficiently employed on
desired task after fine-tuning. With only one pre-trained model, IPT
outperforms the current state-of-the-art methods on various low-level
benchmarks.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Heterogeneous Generative Knowledge Distillation with Masked Image
Modeling [33.95780732124864]
Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models.
We develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion.
Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models.
arXiv Detail & Related papers (2023-09-18T08:30:55Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z) - On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Efficient Vision Transformers via Fine-Grained Manifold Distillation [96.50513363752836]
Vision transformer architectures have shown extraordinary performance on many computer vision tasks.
Although the network performance is boosted, transformers are often required more computational resources.
We propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches.
arXiv Detail & Related papers (2021-07-03T08:28:34Z) - Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.
We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.