Related papers: Data Classification With Multiprocessing

Data Classification With Multiprocessing

URL: http://arxiv.org/abs/2312.15152v1
Date: Sat, 23 Dec 2023 03:42:13 GMT
Title: Data Classification With Multiprocessing
Authors: Anuja Dixit, Shreya Byreddy, Guanqun Song, Ting Zhu
Abstract summary: Python multiprocessing is used to test this hypothesis with different classification algorithms. We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.
Score: 6.513930657238705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Classification is one of the most important tasks in Machine Learning (ML) and with recent advancements in artificial intelligence (AI) it is important to find efficient ways to implement it. Generally, the choice of classification algorithm depends on the data it is dealing with, and accuracy of the algorithm depends on the hyperparameters it is tuned with. One way is to check the accuracy of the algorithms by executing it with different hyperparameters serially and then selecting the parameters that give the highest accuracy to predict the final output. This paper proposes another way where the algorithm is parallelly trained with different hyperparameters to reduce the execution time. In the end, results from all the trained variations of the algorithms are ensembled to exploit the parallelism and improve the accuracy of prediction. Python multiprocessing is used to test this hypothesis with different classification algorithms such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), random forest and decision tree and reviews factors affecting parallelism. Ensembled output considers the predictions from all processes and final class is the one predicted by maximum number of processes. Doing this increases the reliability of predictions. We conclude that ensembling improves accuracy and multiprocessing reduces execution time for selected algorithms.

Related papers

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
Parallel Instance Filtering for Malware Detection [0.0]
This work presents a new parallel instance selection algorithm called Parallel Instance Filtering (PIF) The main idea of the algorithm is to split the data set into non-overlapping subsets of instances covering the whole data set and apply a filtering process for each subset. We compare the PIF algorithm with several state-of-the-art instance selection algorithms on a large data set of 500,000 malicious and benign samples.
arXiv Detail & Related papers (2022-06-28T11:14:20Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms [0.0]
In this paper, I will make a performance comparison of multi-threaded machine learning clustering algorithms. I will be working on Linear Regression, Random Forest, and K-Nearest Neighbors to determine the performance characteristics of the algorithms.
arXiv Detail & Related papers (2021-09-11T13:26:58Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
MementoML: Performance of selected machine learning algorithm configurations on OpenML100 datasets [5.802346990263708]
We present the protocol of generating benchmark data describing the performance of different ML algorithms. Data collected in this way is used to study the factors influencing the algorithm's performance.
arXiv Detail & Related papers (2020-08-30T13:13:52Z)
Weighted Random Search for CNN Hyperparameter Optimization [0.0]
We introduce the weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy. The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values. According to our experiments, the WRS algorithm outperforms the other methods.
arXiv Detail & Related papers (2020-03-30T09:40:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.