On Efficient Transformer and Image Pre-training for Low-level Vision
- URL: http://arxiv.org/abs/2112.10175v1
- Date: Sun, 19 Dec 2021 15:50:48 GMT
- Title: On Efficient Transformer and Image Pre-training for Low-level Vision
- Authors: Wenbo Li, Xin Lu, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia
- Abstract summary: Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
- Score: 74.22436001426517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training has marked numerous state of the arts in high-level computer
vision, but few attempts have ever been made to investigate how pre-training
acts in image processing systems. In this paper, we present an in-depth study
of image pre-training. To conduct this study on solid ground with practical
value in mind, we first propose a generic, cost-effective Transformer-based
framework for image processing. It yields highly competitive performance across
a range of low-level tasks, though under constrained parameters and
computational complexity. Then, based on this framework, we design a whole set
of principled evaluation tools to seriously and comprehensively diagnose image
pre-training in different tasks, and uncover its effects on internal network
representations. We find pre-training plays strikingly different roles in
low-level tasks. For example, pre-training introduces more local information to
higher layers in super-resolution (SR), yielding significant performance gains,
while pre-training hardly affects internal feature representations in
denoising, resulting in a little gain. Further, we explore different methods of
pre-training, revealing that multi-task pre-training is more effective and
data-efficient. All codes and models will be released at
https://github.com/fenglinglwb/EDT.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - A Closer Look at Self-Supervised Lightweight Vision Transformers [44.44888945683147]
Self-supervised learning on large-scale Vision Transformers (ViTs) as pre-training methods has achieved promising downstream performance.
We benchmark several self-supervised pre-training methods on image classification tasks and some downstream dense prediction tasks.
Even vanilla lightweight ViTs show comparable performance to previous SOTA networks with delicate architecture design.
arXiv Detail & Related papers (2022-05-28T14:14:57Z) - Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data.
Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data.
On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z) - A Practical Contrastive Learning Framework for Single-Image
Super-Resolution [51.422185656787285]
We investigate contrastive learning-based single image super-resolution from two perspectives.
We propose a practical contrastive learning framework for SISR, named PCL-SR.
Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance.
arXiv Detail & Related papers (2021-11-27T15:42:12Z) - Efficient Visual Pretraining with Contrastive Detection [31.444554574326283]
We introduce a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations.
This objective extracts a rich learning signal per image, leading to state-of-the-art transfer performance from ImageNet to COCO.
In particular, our strongest ImageNet-pretrained model performs on par with SEER, one of the largest self-supervised systems to date.
arXiv Detail & Related papers (2021-03-19T14:05:12Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.