Related papers: Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems

Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems

URL: http://arxiv.org/abs/2501.13139v1
Date: Wed, 22 Jan 2025 13:39:44 GMT
Title: Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems
Authors: Marco Angioli, Marcello Barbirotta, Abdallah Cheikh, Antonio Mastrandrea, Francesco Menichelli, Mauro Olivieri,
Abstract summary: This paper presents algorithmic and hardware techniques to implement two LinearUCB Contextual Bandits algorithms on resource-constrained embedded devices.<n>Results show notable improvements in execution time and energy consumption.
Score: 0.10470286407954035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the Internet of Things expands, embedding Artificial Intelligence algorithms in resource-constrained devices has become increasingly important to enable real-time, autonomous decision-making without relying on centralized cloud servers. However, implementing and executing complex algorithms in embedded devices poses significant challenges due to limited computational power, memory, and energy resources. This paper presents algorithmic and hardware techniques to efficiently implement two LinearUCB Contextual Bandits algorithms on resource-constrained embedded devices. Algorithmic modifications based on the Sherman-Morrison-Woodbury formula streamline model complexity, while vector acceleration is harnessed to speed up matrix operations. We analyze the impact of each optimization individually and then combine them in a two-pronged strategy. The results show notable improvements in execution time and energy consumption, demonstrating the effectiveness of combining algorithmic and hardware optimizations to enhance learning models for edge computing environments with low-power and real-time requirements.

Related papers

Dynamic Range Reduction via Branch-and-Bound [1.533133219129073]
Key strategy to enhance hardware accelerators is the reduction of precision in arithmetic operations. This paper introduces a fully principled Branch-and-Bound algorithm for reducing precision needs in QUBO problems. Experiments validate our algorithm's effectiveness on an actual quantum annealer.
arXiv Detail & Related papers (2024-09-17T03:07:56Z)
Computation Rate Maximization for Wireless Powered Edge Computing With Multi-User Cooperation [10.268239987867453]
This study considers a wireless-powered mobile edge computing system that includes a hybrid access point equipped with a computing unit and multiple Internet of Things (IoT) devices. We propose a novel muti-user cooperation scheme to improve computation performance, where collaborative clusters are dynamically formed. Specifically, we aims to maximize the weighted sum computation rate (WSCR) of all the IoT devices in the network.
arXiv Detail & Related papers (2024-01-22T05:22:19Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Collaborative Learning over Wireless Networks: An Introductory Overview [84.09366153693361]
We will mainly focus on collaborative training across wireless devices. Many distributed optimization algorithms have been developed over the last decades. They provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local.
arXiv Detail & Related papers (2021-12-07T20:15:39Z)
ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations. We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
Hard-ODT: Hardware-Friendly Online Decision Tree Learning Algorithm and System [17.55491405857204]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets. We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. We present Hard-ODT, a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques.
arXiv Detail & Related papers (2020-12-11T12:06:44Z)
Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA [20.487660974785943]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets. We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. We present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array.
arXiv Detail & Related papers (2020-09-03T03:23:43Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.