Soft Random Sampling: A Theoretical and Empirical Analysis
- URL: http://arxiv.org/abs/2311.12727v2
- Date: Fri, 24 Nov 2023 03:27:31 GMT
- Title: Soft Random Sampling: A Theoretical and Empirical Analysis
- Authors: Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian
Kingsbury
- Abstract summary: Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data.
It selects a uniformly speed at random with replacement from each data set in each epoch.
It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
- Score: 59.719035355483875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Soft random sampling (SRS) is a simple yet effective approach for efficient
training of large-scale deep neural networks when dealing with massive data.
SRS selects a subset uniformly at random with replacement from the full data
set in each epoch. In this paper, we conduct a theoretical and empirical
analysis of SRS. First, we analyze its sampling dynamics including data
coverage and occupancy. Next, we investigate its convergence with non-convex
objective functions and give the convergence rate. Finally, we provide its
generalization performance. We empirically evaluate SRS for image recognition
on CIFAR10 and automatic speech recognition on Librispeech and an in-house
payload dataset to demonstrate its effectiveness. Compared to existing
coreset-based data selection methods, SRS offers a better accuracy-efficiency
trade-off. Especially on real-world industrial scale data sets, it is shown to
be a powerful training strategy with significant speedup and competitive
performance with almost no additional computing cost.
Related papers
- Double Machine Learning for Adaptive Causal Representation in High-Dimensional Data [14.25379577156518]
Support points sample splitting (SPSS) is employed for efficient double machine learning (DML) in causal inference.
The support points are selected and split as optimal representative points of the full raw data in a random sample.
They offer the best representation of a full big dataset, whereas the unit structural information of the underlying distribution via the traditional random data splitting is most likely not preserved.
arXiv Detail & Related papers (2024-11-22T01:54:53Z) - A Reproducible Analysis of Sequential Recommender Systems [13.987953631479662]
SequentialEnsurer Systems (SRSs) have emerged as a highly efficient approach to recommendation systems.
Existing works exhibit shortcomings in replicability of results, leading to inconsistent statements across papers.
Our work fills these gaps by standardising data pre-processing and model implementations.
arXiv Detail & Related papers (2024-08-07T16:23:29Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning [28.042568086423298]
Repeated Sampling of Random Subsets (RS2) is a powerful yet overlooked random sampling strategy.
We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet.
Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques.
arXiv Detail & Related papers (2023-05-28T20:38:13Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis.
In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand.
We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.