Related papers: CheapET-3: Cost-Efficient Use of Remote DNN Models

CheapET-3: Cost-Efficient Use of Remote DNN Models

URL: http://arxiv.org/abs/2208.11552v1
Date: Wed, 24 Aug 2022 13:54:27 GMT
Title: CheapET-3: Cost-Efficient Use of Remote DNN Models
Authors: Michael Weiss
Abstract summary: We propose a new software architecture for client-side applications, where a small local DNN is used alongside a remote large-scale model. In a proof of concept we reduce prediction cost by up to 50% without negatively impacting system accuracy.
Score: 1.0660480034605242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On complex problems, state of the art prediction accuracy of Deep Neural Networks (DNN) can be achieved using very large-scale models, consisting of billions of parameters. Such models can only be run on dedicated servers, typically provided by a 3rd party service, which leads to a substantial monetary cost for every prediction. We propose a new software architecture for client-side applications, where a small local DNN is used alongside a remote large-scale model, aiming to make easy predictions locally at negligible monetary cost, while still leveraging the benefits of a large model for challenging inputs. In a proof of concept we reduce prediction cost by up to 50% without negatively impacting system accuracy.

Related papers

Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale [0.7025445595542577]
Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. BayNNs with Dropout as an approximation offer a systematic approach to uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and quantifying. We introduce a novel Spintronic memory-based CIM architecture for the proposed BayNN that achieves more than $100times$ energy savings compared to the state-of-the-art.
arXiv Detail & Related papers (2023-11-27T13:41:20Z)
Spatial-SpinDrop: Spatial Dropout-based Binary Bayesian Neural Network with Spintronics Implementation [1.3603499630771996]
We introduce MC-SpatialDropout, a spatial dropout-based approximate BayNNs with spintronics emerging devices. The number of dropout modules per network layer is reduced by a factor of $9times$ and energy consumption by a factor of $94.11times$, while still achieving comparable predictive performance and uncertainty estimates.
arXiv Detail & Related papers (2023-06-16T21:38:13Z)
Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks [4.987581730476023]
Large-scale Deep Neural Networks (DNNs) are too large to be efficiently run on resource-constrained devices. We propose BiSupervised, where a system attempts to make a prediction on a small-scale local model. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies.
arXiv Detail & Related papers (2023-04-05T04:35:23Z)
A Deep Neural Network Based Approach to Building Budget-Constrained Models for Big Data Analysis [11.562071835482223]
We introduce an approach to eliminating less important features for big data analysis using Deep Neural Networks (DNNs) We identify the weak links and weak neurons, and remove some input features to bring the model cost within a given budget.
arXiv Detail & Related papers (2023-02-23T00:00:32Z)
Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures. We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z)
Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead [67.87678914831477]
Deep Neural Networks (DNNs) enable a wide series of technological advancements. Recent findings indicate that transient hardware faults may corrupt the models prediction dramatically. In this work, we propose to tackle the reliability issue both at training and model design time.
arXiv Detail & Related papers (2022-05-28T13:09:30Z)
DNNAbacus: Toward Accurate Computational Cost Prediction for Deep Neural Networks [0.9896984829010892]
This paper investigates the computational resource demands of 29 classical deep neural networks and builds accurate models for predicting computational costs. We propose a lightweight prediction approach DNNAbacus with a novel network structural matrix for network representation. Our experimental results show that the mean relative error (MRE) is 0.9% with respect to time and 2.8% with respect to memory for 29 classic models, which is much lower than the state-of-the-art works.
arXiv Detail & Related papers (2022-05-24T14:21:27Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Making DensePose fast and light [78.49552144907513]
Existing neural network models capable of solving this task are heavily parameterized. To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection. In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
arXiv Detail & Related papers (2020-06-26T19:42:20Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.