Support Vector Machine Implementation on MPI-CUDA and Tensorflow
Framework
- URL: http://arxiv.org/abs/2311.14908v1
- Date: Sat, 25 Nov 2023 02:52:37 GMT
- Title: Support Vector Machine Implementation on MPI-CUDA and Tensorflow
Framework
- Authors: Islam Elgarhy
- Abstract summary: Support Vector Machine (SVM) algorithm requires a high computational cost to solve a complex quadratic programming (QP) optimization problem.
parallel multi-architecture, available in both multi-core CPUs and highly scalable GPU, emerges as a promising solution to enhance algorithm performance.
This paper achieves a comparative study that implements the SVM algorithm on different parallel architecture frameworks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Support Vector Machine (SVM) algorithm requires a high computational cost
(both in memory and time) to solve a complex quadratic programming (QP)
optimization problem during the training process. Consequently, SVM
necessitates high computing hardware capabilities. The central processing unit
(CPU) clock frequency cannot be increased due to physical limitations in the
miniaturization process. However, the potential of parallel multi-architecture,
available in both multi-core CPUs and highly scalable GPUs, emerges as a
promising solution to enhance algorithm performance. Therefore, there is an
opportunity to reduce the high computational time required by SVM for solving
the QP optimization problem. This paper presents a comparative study that
implements the SVM algorithm on different parallel architecture frameworks. The
experimental results show that SVM MPI-CUDA implementation achieves a speedup
over SVM TensorFlow implementation on different datasets. Moreover, SVM
TensorFlow implementation provides a cross-platform solution that can be
migrated to alternative hardware components, which will reduces the development
time.
Related papers
- Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions.
While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly.
We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and
more RAM! [0.0]
Support vector machines (SVMs) are a standard method in the machine learning toolbox.
Non-linear kernel SVMs often deliver highly accurate predictors, however, at the cost of long training times.
In this work, we combine both approaches to design an extremely fast dual SVM solver.
arXiv Detail & Related papers (2022-07-03T11:51:41Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - Scaling Quantum Approximate Optimization on Near-term Hardware [49.94954584453379]
We quantify scaling of the expected resource requirements by optimized circuits for hardware architectures with varying levels of connectivity.
We show the number of measurements, and hence total time to synthesizing solution, grows exponentially in problem size and problem graph degree.
These problems may be alleviated by increasing hardware connectivity or by recently proposed modifications to the QAOA that achieve higher performance with fewer circuit layers.
arXiv Detail & Related papers (2022-01-06T21:02:30Z) - AML-SVM: Adaptive Multilevel Learning with Support Vector Machines [0.0]
This paper proposes an adaptive multilevel learning framework for the nonlinear SVM.
It improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance.
arXiv Detail & Related papers (2020-11-05T00:17:02Z) - A Vertex Cut based Framework for Load Balancing and Parallelism
Optimization in Multi-core Systems [15.913119724815733]
High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems.
Parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections.
We propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems.
arXiv Detail & Related papers (2020-10-09T07:54:28Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.