On the Efficacy of Multi-scale Data Samplers for Vision Applications
- URL: http://arxiv.org/abs/2309.04502v1
- Date: Fri, 8 Sep 2023 04:29:50 GMT
- Title: On the Efficacy of Multi-scale Data Samplers for Vision Applications
- Authors: Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad
Rastegari, Sachin Mehta, Maxwell Horton
- Abstract summary: We show that multi-scale samplers behave as implicit data regularizers and accelerate training speed.
We extend a multi-scale variable batch sampler with a simple curriculum that progressively grows resolutions throughout training.
- Score: 32.13488876863029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-scale resolution training has seen an increased adoption across
multiple vision tasks, including classification and detection. Training with
smaller resolutions enables faster training at the expense of a drop in
accuracy. Conversely, training with larger resolutions has been shown to
improve performance, but memory constraints often make this infeasible. In this
paper, we empirically study the properties of multi-scale training procedures.
We focus on variable batch size multi-scale data samplers that randomly sample
an input resolution at each training iteration and dynamically adjust their
batch size according to the resolution. Such samplers have been shown to
improve model accuracy beyond standard training with a fixed batch size and
resolution, though it is not clear why this is the case. We explore the
properties of these data samplers by performing extensive experiments on
ResNet-101 and validate our conclusions across multiple architectures, tasks,
and datasets. We show that multi-scale samplers behave as implicit data
regularizers and accelerate training speed. Compared to models trained with
single-scale samplers, we show that models trained with multi-scale samplers
retain or improve accuracy, while being better-calibrated and more robust to
scaling and data distribution shifts. We additionally extend a multi-scale
variable batch sampler with a simple curriculum that progressively grows
resolutions throughout training, allowing for a compute reduction of more than
30%. We show that the benefits of multi-scale training extend to detection and
instance segmentation tasks, where we observe a 37% reduction in training FLOPs
along with a 3-4% mAP increase on MS-COCO using a Mask R-CNN model.
Related papers
- Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation [1.3157419797035321]
The article proposes a novel small sample instance segmentation solution from the perspective of maximizing the utilization of existing information.
First, it helps the model fully utilize unlabeled data by learning to generate pseudo labels, increasing the number of available samples.
Second, by integrating the features of text and image, more accurate classification results can be obtained.
arXiv Detail & Related papers (2024-10-21T14:44:08Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - On Pretraining Data Diversity for Self-Supervised Learning [57.91495006862553]
We explore the impact of training with more diverse datasets on the performance of self-supervised learning (SSL) under a fixed computational budget.
Our findings consistently demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal.
arXiv Detail & Related papers (2024-03-20T17:59:58Z) - Gaussian Switch Sampling: A Second Order Approach to Active Learning [11.775252660867285]
In active learning, acquisition functions define informativeness directly on the representation position within the model manifold.
We propose a grounded second-order definition of information content and sample importance within the context of active learning.
We show that our definition produces highly accurate importance scores even when the model representations are constrained by the lack of training data.
arXiv Detail & Related papers (2023-02-16T15:24:56Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Multi-Domain Joint Training for Person Re-Identification [51.73921349603597]
Deep learning-based person Re-IDentification (ReID) often requires a large amount of training data to achieve good performance.
It appears that collecting more training data from diverse environments tends to improve the ReID performance.
We propose an approach called Domain-Camera-Sample Dynamic network (DCSD) whose parameters can be adaptive to various factors.
arXiv Detail & Related papers (2022-01-06T09:20:59Z) - One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning [35.0157090322113]
Large-scale machine learning systems are often continuously trained with enormous data from production environments.
The sheer volume of streaming data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice.
We propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes.
arXiv Detail & Related papers (2021-04-27T11:29:02Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning [83.48587570246231]
Visual Similarity plays an important role in many computer vision applications.
Deep metric learning (DML) is a powerful framework for learning such similarities.
We propose and study multiple complementary learning tasks, targeting conceptually different data relationships.
We learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance.
arXiv Detail & Related papers (2020-04-28T12:26:50Z) - Efficient Deep Representation Learning by Adaptive Latent Space Sampling [16.320898678521843]
Supervised deep learning requires a large amount of training samples with annotations, which are expensive and time-consuming to obtain.
We propose a novel training framework which adaptively selects informative samples that are fed to the training process.
arXiv Detail & Related papers (2020-03-19T22:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.