CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping
- URL: http://arxiv.org/abs/2205.15955v1
- Date: Tue, 31 May 2022 16:57:28 GMT
- Title: CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping
- Authors: Junlin Han, Lars Petersson, Hongdong Li, Ian Reid
- Abstract summary: We present a simple method, CropMix, for producing a rich input distribution from the original dataset distribution.
CropMix can be seamlessly applied to virtually any training recipe and neural network architecture performing classification tasks.
We show that CropMix is of benefit to both contrastive learning and masked image modeling towards more powerful representations.
- Score: 97.05377757299672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple method, CropMix, for the purpose of producing a rich
input distribution from the original dataset distribution. Unlike single random
cropping, which may inadvertently capture only limited information, or
irrelevant information, like pure background, unrelated objects, etc, we crop
an image multiple times using distinct crop scales, thereby ensuring that
multi-scale information is captured. The new input distribution, serving as
training data, useful for a number of vision tasks, is then formed by simply
mixing multiple cropped views. We first demonstrate that CropMix can be
seamlessly applied to virtually any training recipe and neural network
architecture performing classification tasks. CropMix is shown to improve the
performance of image classifiers on several benchmark tasks across-the-board
without sacrificing computational simplicity and efficiency. Moreover, we show
that CropMix is of benefit to both contrastive learning and masked image
modeling towards more powerful representations, where preferable results are
achieved when learned representations are transferred to downstream tasks. Code
is available at GitHub.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for
Multi-label Image Classification [46.8141860303439]
We introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix.
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
arXiv Detail & Related papers (2023-11-26T05:45:27Z) - Enhanced Performance of Pre-Trained Networks by Matched Augmentation
Distributions [10.74023489125222]
We propose a simple solution to address the train-test distributional shift.
We combine results for multiple random crops for a test image.
This not only matches the train time augmentation but also provides the full coverage of the input image.
arXiv Detail & Related papers (2022-01-19T22:33:00Z) - Sample selection for efficient image annotation [14.695979686066066]
Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances.
We propose an efficient image selection approach that samples the most informative images from the unlabeled dataset.
Our method can reduce up to 80% of manual annotation workload, compared to full manual labeling setting, and performs better than random sampling.
arXiv Detail & Related papers (2021-05-10T21:25:10Z) - ResizeMix: Mixing Data with Preserved Object Information and True Labels [57.00554495298033]
We study the importance of the saliency information for mixing data, and find that the saliency information is not so necessary for promoting the augmentation performance.
We propose a more effective but very easily implemented method, namely ResizeMix.
arXiv Detail & Related papers (2020-12-21T03:43:13Z) - SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained
Data [124.95585891086894]
Proposal is called Semantically Proportional Mixing (SnapMix)
It exploits class activation map (CAM) to lessen the label noise in augmenting fine-grained data.
Our method consistently outperforms existing mixed-based approaches.
arXiv Detail & Related papers (2020-12-09T03:37:30Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - Variational Clustering: Leveraging Variational Autoencoders for Image
Clustering [8.465172258675763]
Variational Autoencoders (VAEs) naturally lend themselves to learning data distributions in a latent space.
We propose a method based on VAEs where we use a Gaussian Mixture prior to help cluster the images accurately.
Our method simultaneously learns a prior that captures the latent distribution of the images and a posterior to help discriminate well between data points.
arXiv Detail & Related papers (2020-05-10T09:34:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.