Related papers: Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework

Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework

URL: http://arxiv.org/abs/2311.14908v1
Date: Sat, 25 Nov 2023 02:52:37 GMT
Title: Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework
Authors: Islam Elgarhy
Abstract summary: Support Vector Machine (SVM) algorithm requires a high computational cost to solve a complex quadratic programming (QP) optimization problem. parallel multi-architecture, available in both multi-core CPUs and highly scalable GPU, emerges as a promising solution to enhance algorithm performance. This paper achieves a comparative study that implements the SVM algorithm on different parallel architecture frameworks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Support Vector Machine (SVM) algorithm requires a high computational cost (both in memory and time) to solve a complex quadratic programming (QP) optimization problem during the training process. Consequently, SVM necessitates high computing hardware capabilities. The central processing unit (CPU) clock frequency cannot be increased due to physical limitations in the miniaturization process. However, the potential of parallel multi-architecture, available in both multi-core CPUs and highly scalable GPUs, emerges as a promising solution to enhance algorithm performance. Therefore, there is an opportunity to reduce the high computational time required by SVM for solving the QP optimization problem. This paper presents a comparative study that implements the SVM algorithm on different parallel architecture frameworks. The experimental results show that SVM MPI-CUDA implementation achieves a speedup over SVM TensorFlow implementation on different datasets. Moreover, SVM TensorFlow implementation provides a cross-platform solution that can be migrated to alternative hardware components, which will reduces the development time.

Related papers

Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving [12.068287973463786]
Serving Large Language Models (LLMs) is critical for AI-powered applications but demands substantial computational resources. Low-precision computation has emerged as a key technique to improve efficiency while reducing resource consumption. Existing approaches for generating low-precision kernels are limited to weight bit widths that are powers of two.
arXiv Detail & Related papers (2025-04-17T14:45:03Z)
Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly. We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z)
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs. USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and more RAM! [0.0]
Support vector machines (SVMs) are a standard method in the machine learning toolbox. Non-linear kernel SVMs often deliver highly accurate predictors, however, at the cost of long training times. In this work, we combine both approaches to design an extremely fast dual SVM solver.
arXiv Detail & Related papers (2022-07-03T11:51:41Z)
PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning. However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware. PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z)
Scaling Quantum Approximate Optimization on Near-term Hardware [49.94954584453379]
We quantify scaling of the expected resource requirements by optimized circuits for hardware architectures with varying levels of connectivity. We show the number of measurements, and hence total time to synthesizing solution, grows exponentially in problem size and problem graph degree. These problems may be alleviated by increasing hardware connectivity or by recently proposed modifications to the QAOA that achieve higher performance with fewer circuit layers.
arXiv Detail & Related papers (2022-01-06T21:02:30Z)
AML-SVM: Adaptive Multilevel Learning with Support Vector Machines [0.0]
This paper proposes an adaptive multilevel learning framework for the nonlinear SVM. It improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance.
arXiv Detail & Related papers (2020-11-05T00:17:02Z)
A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems [15.913119724815733]
High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems. Parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections. We propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems.
arXiv Detail & Related papers (2020-10-09T07:54:28Z)
On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points. We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.