Data Classification With Multiprocessing
- URL: http://arxiv.org/abs/2312.15152v1
- Date: Sat, 23 Dec 2023 03:42:13 GMT
- Title: Data Classification With Multiprocessing
- Authors: Anuja Dixit, Shreya Byreddy, Guanqun Song, Ting Zhu
- Abstract summary: Python multiprocessing is used to test this hypothesis with different classification algorithms.
We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.
- Score: 6.513930657238705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification is one of the most important tasks in Machine Learning (ML)
and with recent advancements in artificial intelligence (AI) it is important to
find efficient ways to implement it. Generally, the choice of classification
algorithm depends on the data it is dealing with, and accuracy of the algorithm
depends on the hyperparameters it is tuned with. One way is to check the
accuracy of the algorithms by executing it with different hyperparameters
serially and then selecting the parameters that give the highest accuracy to
predict the final output. This paper proposes another way where the algorithm
is parallelly trained with different hyperparameters to reduce the execution
time. In the end, results from all the trained variations of the algorithms are
ensembled to exploit the parallelism and improve the accuracy of prediction.
Python multiprocessing is used to test this hypothesis with different
classification algorithms such as K-Nearest Neighbors (KNN), Support Vector
Machines (SVM), random forest and decision tree and reviews factors affecting
parallelism. Ensembled output considers the predictions from all processes and
final class is the one predicted by maximum number of processes. Doing this
increases the reliability of predictions. We conclude that ensembling improves
accuracy and multiprocessing reduces execution time for selected algorithms.
Related papers
- A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z) - Parallel Instance Filtering for Malware Detection [0.0]
This work presents a new parallel instance selection algorithm called Parallel Instance Filtering (PIF)
The main idea of the algorithm is to split the data set into non-overlapping subsets of instances covering the whole data set and apply a filtering process for each subset.
We compare the PIF algorithm with several state-of-the-art instance selection algorithms on a large data set of 500,000 malicious and benign samples.
arXiv Detail & Related papers (2022-06-28T11:14:20Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Benchmarking Processor Performance by Multi-Threaded Machine Learning
Algorithms [0.0]
In this paper, I will make a performance comparison of multi-threaded machine learning clustering algorithms.
I will be working on Linear Regression, Random Forest, and K-Nearest Neighbors to determine the performance characteristics of the algorithms.
arXiv Detail & Related papers (2021-09-11T13:26:58Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - MementoML: Performance of selected machine learning algorithm
configurations on OpenML100 datasets [5.802346990263708]
We present the protocol of generating benchmark data describing the performance of different ML algorithms.
Data collected in this way is used to study the factors influencing the algorithm's performance.
arXiv Detail & Related papers (2020-08-30T13:13:52Z) - Weighted Random Search for CNN Hyperparameter Optimization [0.0]
We introduce the weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy.
The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values.
According to our experiments, the WRS algorithm outperforms the other methods.
arXiv Detail & Related papers (2020-03-30T09:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.