Evolving Image Compositions for Feature Representation Learning
- URL: http://arxiv.org/abs/2106.09011v1
- Date: Wed, 16 Jun 2021 17:57:18 GMT
- Title: Evolving Image Compositions for Feature Representation Learning
- Authors: Paola Cascante-Bonilla, Arshdeep Sekhon, Yanjun Qi, Vicente Ordonez
- Abstract summary: We propose PatchMix, a data augmentation method that creates new samples by composing patches from pairs of images in a grid-like pattern.
A ResNet-50 model trained on ImageNet using PatchMix exhibits superior transfer learning capabilities across a wide array of benchmarks.
- Score: 22.22790506995431
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Convolutional neural networks for visual recognition require large amounts of
training samples and usually benefit from data augmentation. This paper
proposes PatchMix, a data augmentation method that creates new samples by
composing patches from pairs of images in a grid-like pattern. These new
samples' ground truth labels are set as proportional to the number of patches
from each image. We then add a set of additional losses at the patch-level to
regularize and to encourage good representations at both the patch and image
levels. A ResNet-50 model trained on ImageNet using PatchMix exhibits superior
transfer learning capabilities across a wide array of benchmarks. Although
PatchMix can rely on random pairings and random grid-like patterns for mixing,
we explore evolutionary search as a guiding strategy to discover optimal
grid-like patterns and image pairing jointly. For this purpose, we conceive a
fitness function that bypasses the need to re-train a model to evaluate each
choice. In this way, PatchMix outperforms a base model on CIFAR-10 (+1.91),
CIFAR-100 (+5.31), Tiny Imagenet (+3.52), and ImageNet (+1.16) by significant
margins, also outperforming previous state-of-the-art pairwise augmentation
strategies.
Related papers
- Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - Inter-Instance Similarity Modeling for Contrastive Learning [22.56316444504397]
We propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT)
Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images.
Our proposed method significantly outperforms the previous state-of-the-art on both ImageNet-1K and CIFAR datasets.
arXiv Detail & Related papers (2023-06-21T13:03:47Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - Correlation Verification for Image Retrieval [15.823918683848877]
We propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet)
CVNet compresses dense feature correlation into image similarity while learning diverse geometric matching patterns from various image pairs.
Our proposed network shows state-of-the-art performance on several retrieval benchmarks with a significant margin.
arXiv Detail & Related papers (2022-04-04T13:18:49Z) - Corrupted Image Modeling for Self-Supervised Visual Pre-Training [103.99311611776697]
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.
CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens.
After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks.
arXiv Detail & Related papers (2022-02-07T17:59:04Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Shape-Texture Debiased Neural Network Training [50.6178024087048]
Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset.
We develop an algorithm for shape-texture debiased learning.
Experiments show that our method successfully improves model performance on several image recognition benchmarks.
arXiv Detail & Related papers (2020-10-12T19:16:12Z) - SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better
Regularization [9.126576583256506]
We propose SaliencyMix to improve the generalization ability of deep learning models.
SaliencyMix carefully selects a representative image patch with the help of a saliency map and mixes this indicative patch with the target image.
SaliencyMix achieves the best known top-1 error of 21.26% and 20.09% for ResNet-50 and ResNet-101 architectures on ImageNet classification.
arXiv Detail & Related papers (2020-06-02T17:18:34Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.