Soft Random Sampling: A Theoretical and Empirical Analysis
- URL: http://arxiv.org/abs/2311.12727v2
- Date: Fri, 24 Nov 2023 03:27:31 GMT
- Title: Soft Random Sampling: A Theoretical and Empirical Analysis
- Authors: Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian
Kingsbury
- Abstract summary: Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data.
It selects a uniformly speed at random with replacement from each data set in each epoch.
It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
- Score: 59.719035355483875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Soft random sampling (SRS) is a simple yet effective approach for efficient
training of large-scale deep neural networks when dealing with massive data.
SRS selects a subset uniformly at random with replacement from the full data
set in each epoch. In this paper, we conduct a theoretical and empirical
analysis of SRS. First, we analyze its sampling dynamics including data
coverage and occupancy. Next, we investigate its convergence with non-convex
objective functions and give the convergence rate. Finally, we provide its
generalization performance. We empirically evaluate SRS for image recognition
on CIFAR10 and automatic speech recognition on Librispeech and an in-house
payload dataset to demonstrate its effectiveness. Compared to existing
coreset-based data selection methods, SRS offers a better accuracy-efficiency
trade-off. Especially on real-world industrial scale data sets, it is shown to
be a powerful training strategy with significant speedup and competitive
performance with almost no additional computing cost.
Related papers
- RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy Assessment [10.284993431741377]
We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships.<n>We reformulate data selection as a reinforcement learning process and propose RL-Selector.<n>Our method consistently outperforms existing state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-26T06:28:56Z) - FedRS-Bench: Realistic Federated Learning Datasets and Benchmarks in Remote Sensing [13.10438674331326]
Federated learning (FL) offers a solution by enabling collaborative model training across decentralized remote sensing (RS) data sources without exposing raw data.<n>We propose a realistic federated RS dataset, termed FedRS, which is representative of realistic operational scenarios.<n>We implement 10 baseline FL algorithms and evaluation metrics to construct the comprehensive FedRS-Bench.
arXiv Detail & Related papers (2025-05-13T08:04:03Z) - Double Machine Learning for Adaptive Causal Representation in High-Dimensional Data [14.25379577156518]
Support points sample splitting (SPSS) is employed for efficient double machine learning (DML) in causal inference.
The support points are selected and split as optimal representative points of the full raw data in a random sample.
They offer the best representation of a full big dataset, whereas the unit structural information of the underlying distribution via the traditional random data splitting is most likely not preserved.
arXiv Detail & Related papers (2024-11-22T01:54:53Z) - Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z) - A Reproducible Analysis of Sequential Recommender Systems [13.987953631479662]
SequentialEnsurer Systems (SRSs) have emerged as a highly efficient approach to recommendation systems.
Existing works exhibit shortcomings in replicability of results, leading to inconsistent statements across papers.
Our work fills these gaps by standardising data pre-processing and model implementations.
arXiv Detail & Related papers (2024-08-07T16:23:29Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning [28.042568086423298]
Repeated Sampling of Random Subsets (RS2) is a powerful yet overlooked random sampling strategy.
We test RS2 against thirty state-of-the-art data pruning and data distillation methods across four datasets including ImageNet.
Our results demonstrate that RS2 significantly reduces time-to-accuracy compared to existing techniques.
arXiv Detail & Related papers (2023-05-28T20:38:13Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis.
In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand.
We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.