Accelerating Machine Learning Algorithms with Adaptive Sampling
- URL: http://arxiv.org/abs/2309.14221v1
- Date: Mon, 25 Sep 2023 15:25:59 GMT
- Title: Accelerating Machine Learning Algorithms with Adaptive Sampling
- Authors: Mo Tiwari
- Abstract summary: It is often sufficient to substitute computationally intensive subroutines with a special kind of randomized counterparts that results in almost no degradation in quality.
This thesis demonstrates that it is often sufficient, instead, to substitute computationally intensive subroutines with a special kind of randomized counterparts that results in almost no degradation in quality.
- Score: 1.539582851341637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The era of huge data necessitates highly efficient machine learning
algorithms. Many common machine learning algorithms, however, rely on
computationally intensive subroutines that are prohibitively expensive on large
datasets. Oftentimes, existing techniques subsample the data or use other
methods to improve computational efficiency, at the expense of incurring some
approximation error. This thesis demonstrates that it is often sufficient,
instead, to substitute computationally intensive subroutines with a special
kind of randomized counterparts that results in almost no degradation in
quality.
Related papers
- Machine Learning Training Optimization using the Barycentric Correction
Procedure [0.0]
This study proposes combining machine learning algorithms with an efficient methodology known as the barycentric correction procedure (BCP)
It was found that this combination provides significant benefits related to time in synthetic and real data without losing accuracy when the number of instances and dimensions increases.
arXiv Detail & Related papers (2024-03-01T13:56:36Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization.
We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably.
Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z) - Scalable Batch Acquisition for Deep Bayesian Active Learning [70.68403899432198]
In deep active learning, it is important to choose multiple examples to markup at each step.
Existing solutions to this problem, such as BatchBALD, have significant limitations in selecting a large number of examples.
We present the Large BatchBALD algorithm, which aims to achieve comparable quality while being more computationally efficient.
arXiv Detail & Related papers (2023-01-13T11:45:17Z) - Bioinspired Cortex-based Fast Codebook Generation [0.09449650062296822]
We introduce a feature extraction method inspired by sensory cortical networks in the brain.
Dubbed as bioinspired cortex, the algorithm provides convergence to features from streaming signals with superior computational efficiency.
We show herein the superior performance of the cortex model in clustering and vector quantization.
arXiv Detail & Related papers (2022-01-28T18:37:43Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Benchmarking Processor Performance by Multi-Threaded Machine Learning
Algorithms [0.0]
In this paper, I will make a performance comparison of multi-threaded machine learning clustering algorithms.
I will be working on Linear Regression, Random Forest, and K-Nearest Neighbors to determine the performance characteristics of the algorithms.
arXiv Detail & Related papers (2021-09-11T13:26:58Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra [53.46106569419296]
We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression.
We argue that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise.
arXiv Detail & Related papers (2020-11-09T01:13:07Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Guidelines for enhancing data locality in selected machine learning
algorithms [0.0]
We analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality.
Repeated data access can be seen as redundancy in data movement.
This work also identifies some of the opportunities for avoiding these redundancies by directly reusing results.
arXiv Detail & Related papers (2020-01-09T14:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.