Improving Visual Representation Learning through Perceptual
Understanding
- URL: http://arxiv.org/abs/2212.14504v2
- Date: Tue, 28 Mar 2023 13:58:14 GMT
- Title: Improving Visual Representation Learning through Perceptual
Understanding
- Authors: Samyakh Tukra, Frederick Hoffman, Ken Chatfield
- Abstract summary: We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features.
We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an extension to masked autoencoders (MAE) which improves on the
representations learnt by the model by explicitly encouraging the learning of
higher scene-level features. We do this by: (i) the introduction of a
perceptual similarity term between generated and real images (ii) incorporating
several techniques from the adversarial training literature including
multi-scale training and adaptive discriminator augmentation. The combination
of these results in not only better pixel reconstruction but also
representations which appear to capture better higher-level details within
images. More consequentially, we show how our method, Perceptual MAE, leads to
better performance when used for downstream tasks outperforming previous
methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up
to 88.1% when fine-tuning, with similar results for other downstream tasks, all
without use of additional pre-trained models or data.
Related papers
- Multimodal Data Augmentation for Image Captioning using Diffusion Models [12.221685807426264]
We propose a data augmentation method, leveraging a text-to-image model called Stable Diffusion, to expand the training set.
Experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods.
Further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data.
arXiv Detail & Related papers (2023-05-03T01:57:33Z) - SAGE: Saliency-Guided Mixup with Optimal Rearrangements [22.112463794733188]
Saliency-Guided Mixup with Optimal Rearrangements (SAGE)
SAGE creates new training examples by rearranging and mixing image pairs using visual saliency as guidance.
We demonstrate on CIFAR-10 and CIFAR-100 that SAGE achieves better or comparable performance to the state of the art while being more efficient.
arXiv Detail & Related papers (2022-10-31T19:45:21Z) - Expanding Language-Image Pretrained Models for General Video Recognition [136.0948049010682]
Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data.
We present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly.
Our approach surpasses the current state-of-the-art methods by +7.6% and +14.9% in terms of top-1 accuracy under two popular protocols.
arXiv Detail & Related papers (2022-08-04T17:59:54Z) - Augmentation Learning for Semi-Supervised Classification [13.519613713213277]
We propose a Semi-Supervised Learning method that automatically selects the most effective data augmentation policy for a particular dataset.
We show how policy learning can be used to adapt augmentations to datasets beyond ImageNet.
arXiv Detail & Related papers (2022-08-03T10:06:51Z) - Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
Feature Distillation [42.37533586611174]
Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances.
In this paper, we show that the inferior fine-tuning performance of pre-training approaches can be significantly improved by a simple post-processing.
arXiv Detail & Related papers (2022-05-27T17:59:36Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - Differentiable Augmentation for Data-Efficient GAN Training [48.920992130257595]
We propose DiffAugment, a simple method that improves the data efficiency of GANs by imposing various types of differentiable augmentations on both real and fake samples.
Our method can generate high-fidelity images using only 100 images without pre-training, while being on par with existing transfer learning algorithms.
arXiv Detail & Related papers (2020-06-18T17:59:01Z) - A Simple Framework for Contrastive Learning of Visual Representations [116.37752766922407]
This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
We show that composition of data augmentations plays a critical role in defining effective predictive tasks.
We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
arXiv Detail & Related papers (2020-02-13T18:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.