Improved vectorization of OpenCV algorithms for RISC-V CPUs
- URL: http://arxiv.org/abs/2311.12808v1
- Date: Tue, 19 Sep 2023 12:36:03 GMT
- Title: Improved vectorization of OpenCV algorithms for RISC-V CPUs
- Authors: V. D. Volokitin, E. P. Vasiliev, E. A. Kozinov, V. D. Kustikova, A. V.
Liniov, Y. A. Rodimkov, A. V. Sysoyev, and I. B. Meyerov
- Abstract summary: We discuss the possibilities of accelerating computations on available RISC-V processors.
It is shown that improved vectorization speeds up computations on existing prototypes of RISC-V devices by tens of percent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of an open and free RISC-V architecture is of great interest
for a wide range of areas, including high-performance computing and numerical
simulation in mathematics, physics, chemistry and other problem domains. In
this paper, we discuss the possibilities of accelerating computations on
available RISC-V processors by improving the vectorization of several computer
vision and machine learning algorithms in the widely used OpenCV library. It is
shown that improved vectorization speeds up computations on existing prototypes
of RISC-V devices by tens of percent.
Related papers
- Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC [3.829322478948514]
This paper develops a hybrid AI/CV system on Intel's Movidius Myriad X for initializing and tracking the satellite's pose in space missions.
The proposed single-chip, robust-estimation, and real-time solution delivers a throughput of up to 5 FPS for 1-MegaPixel RGB images within a limited power envelope of 2W.
arXiv Detail & Related papers (2024-09-19T17:50:50Z) - RISC-V RVV efficiency for ANN algorithms [0.5892638927736115]
This study examines the effectiveness of applying RVV to commonly used ANN algorithms.
The algorithms were adapted for RISC-V and optimized using RVV after identifying the primary bottlenecks.
arXiv Detail & Related papers (2024-07-18T09:26:07Z) - Full-stack evaluation of Machine Learning inference workloads for RISC-V systems [0.2621434923709917]
This study evaluates the performance of a wide array of machine learning workloads on RISC-V architectures using gem5, an open-source architectural simulator.
Leveraging an open-source compilation toolchain based on Multi-Level Intermediate Representation (MLIR), the research presents benchmarking results specifically focused on deep learning inference workloads.
arXiv Detail & Related papers (2024-05-24T09:24:46Z) - Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like
Architectures [99.20299078655376]
This paper introduces Vision-RWKV, a model adapted from the RWKV model used in the NLP field.
Our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities.
Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage.
arXiv Detail & Related papers (2024-03-04T18:46:20Z) - Support Vector Machine Implementation on MPI-CUDA and Tensorflow
Framework [0.0]
Support Vector Machine (SVM) algorithm requires a high computational cost to solve a complex quadratic programming (QP) optimization problem.
parallel multi-architecture, available in both multi-core CPUs and highly scalable GPU, emerges as a promising solution to enhance algorithm performance.
This paper achieves a comparative study that implements the SVM algorithm on different parallel architecture frameworks.
arXiv Detail & Related papers (2023-11-25T02:52:37Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs.
All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Collaborative Learning over Wireless Networks: An Introductory Overview [84.09366153693361]
We will mainly focus on collaborative training across wireless devices.
Many distributed optimization algorithms have been developed over the last decades.
They provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local.
arXiv Detail & Related papers (2021-12-07T20:15:39Z) - Vector Symbolic Architectures as a Computing Framework for Emerging
Hardware [8.28931204639352]
This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing)
We demonstrate that VSA offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant to modern computing.
This article serves as a reference for computer architects by illustrating the philosophy behind VSA, techniques of distributed computing with them, and their relevance to emerging computing hardware.
arXiv Detail & Related papers (2021-06-09T23:38:39Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.