Related papers: Efficient Low-Latency Dynamic Licensing for Deep Neural Network Deployment on Edge Devices

Efficient Low-Latency Dynamic Licensing for Deep Neural Network Deployment on Edge Devices

URL: http://arxiv.org/abs/2102.12165v1
Date: Wed, 24 Feb 2021 09:36:39 GMT
Title: Efficient Low-Latency Dynamic Licensing for Deep Neural Network Deployment on Edge Devices
Authors: Toan Pham Van, Ngoc N. Tran, Hoang Pham Minh, Tam Nguyen Minh anh Thanh Ta Minh
Abstract summary: We propose an architecture to solve deploying and processing deep neural networks on edge-devices. Adopting this architecture allows low-latency model updates on devices.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Along with the rapid development in the field of artificial intelligence, especially deep learning, deep neural network applications are becoming more and more popular in reality. To be able to withstand the heavy load from mainstream users, deployment techniques are essential in bringing neural network models from research to production. Among the two popular computing topologies for deploying neural network models in production are cloud-computing and edge-computing. Recent advances in communication technologies, along with the great increase in the number of mobile devices, has made edge-computing gradually become an inevitable trend. In this paper, we propose an architecture to solve deploying and processing deep neural networks on edge-devices by leveraging their synergy with the cloud and the access-control mechanisms of the database. Adopting this architecture allows low-latency DNN model updates on devices. At the same time, with only one model deployed, we can easily make different versions of it by setting access permissions on the model weights. This method allows for dynamic model licensing, which benefits commercial applications.

Related papers

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint. This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z)
Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences. It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations. Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z)
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z)
Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy. We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations. We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices [2.28438857884398]
We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences. Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
arXiv Detail & Related papers (2021-11-03T19:30:28Z)
Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z)
Neuromorphic Processing and Sensing: Evolutionary Progression of AI to Spiking [0.0]
Spiking Neural Network algorithms hold the promise to implement advanced artificial intelligence using a fraction of the computations and power requirements. This paper explains the theoretical workings of neuromorphic technologies based on spikes, and overviews the state-of-art in hardware processors, software platforms and neuromorphic sensing devices. A progression path is paved for current machine learning specialists to update their skillset, as well as classification or predictive models from the current generation of deep neural networks to SNNs.
arXiv Detail & Related papers (2020-07-10T20:54:42Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space [24.847651341371684]
In applications where data are scarce, transfer learning and data augmentation techniques are commonly used to improve the generalization of deep learning models. Fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input. We propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space.
arXiv Detail & Related papers (2020-02-12T03:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.