Related papers: Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks

URL: http://arxiv.org/abs/2304.02654v1
Date: Wed, 5 Apr 2023 04:35:23 GMT
Title: Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks
Authors: Michael Weiss and Paolo Tonella
Abstract summary: Large-scale Deep Neural Networks (DNNs) are too large to be efficiently run on resource-constrained devices. We propose BiSupervised, where a system attempts to make a prediction on a small-scale local model. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies.
Score: 4.987581730476023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent decades have seen the rise of large-scale Deep Neural Networks (DNNs) to achieve human-competitive performance in a variety of artificial intelligence tasks. Often consisting of hundreds of millions, if not hundreds of billion parameters, these DNNs are too large to be deployed to, or efficiently run on resource-constrained devices such as mobile phones or IoT microcontrollers. Systems relying on large-scale DNNs thus have to call the corresponding model over the network, leading to substantial costs for hosting and running the large-scale remote model, costs which are often charged on a per-use basis. In this paper, we propose BiSupervised, a novel architecture, where, before relying on a large remote DNN, a system attempts to make a prediction on a small-scale local model. A DNN supervisor monitors said prediction process and identifies easy inputs for which the local prediction can be trusted. For these inputs, the remote model does not have to be invoked, thus saving costs, while only marginally impacting the overall system accuracy. Our architecture furthermore foresees a second supervisor to monitor the remote predictions and identify inputs for which not even these can be trusted, allowing to raise an exception or run a fallback strategy instead. We evaluate the cost savings, and the ability to detect incorrectly predicted inputs on four diverse case studies: IMDB movie review sentiment classification, Github issue triaging, Imagenet image classification, and SQuADv2 free-text question answering

Related papers

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations [66.14054138609355]
We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
arXiv Detail & Related papers (2024-11-20T11:37:14Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
EvSegSNN: Neuromorphic Semantic Segmentation for Event Data [0.6138671548064356]
EvSegSNN is a biologically plausible encoder-decoder U-shaped architecture relying on Parametric Leaky Integrate and Fire neurons. We introduce an end-to-end biologically inspired semantic segmentation approach by combining Spiking Neural Networks with event cameras. Experiments conducted on DDD17 demonstrate that EvSegSNN outperforms the closest state-of-the-art model in terms of MIoU.
arXiv Detail & Related papers (2024-06-20T10:36:24Z)
CheapET-3: Cost-Efficient Use of Remote DNN Models [1.0660480034605242]
We propose a new software architecture for client-side applications, where a small local DNN is used alongside a remote large-scale model. In a proof of concept we reduce prediction cost by up to 50% without negatively impacting system accuracy.
arXiv Detail & Related papers (2022-08-24T13:54:27Z)
CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems [0.0]
A Convolutional Neural Network (CNN) is a class of Deep Neural Network (DNN) widely used in the analysis of visual images captured by an image sensor. In this paper, we propose a neoteric variant of deep convolutional neural network architecture to ameliorate the performance of existing CNN architectures for real-time inference on embedded systems.
arXiv Detail & Related papers (2021-12-01T18:20:52Z)
Pruning and Slicing Neural Networks using Formal Verification [0.2538209532048866]
Deep neural networks (DNNs) play an increasingly important role in various computer systems. In order to create these networks, engineers typically specify a desired topology, and then use an automated training algorithm to select the network's weights. Here, we propose to address this challenge by harnessing recent advances in DNN verification.
arXiv Detail & Related papers (2021-05-28T07:53:50Z)
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution. Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs. Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
EagerNet: Early Predictions of Neural Networks for Computationally Efficient Intrusion Detection [2.223733768286313]
We propose a new architecture to detect network attacks with minimal resources. The architecture is able to deal with either binary or multiclass classification problems and trades prediction speed for the accuracy of the network.
arXiv Detail & Related papers (2020-07-27T11:31:37Z)
Making DensePose fast and light [78.49552144907513]
Existing neural network models capable of solving this task are heavily parameterized. To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection. In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
arXiv Detail & Related papers (2020-06-26T19:42:20Z)
Neural Networks and Value at Risk [59.85784504799224]
We perform Monte-Carlo simulations of asset returns for Value at Risk threshold estimation. Using equity markets and long term bonds as test assets, we investigate neural networks. We find our networks when fed with substantially less data to perform significantly worse.
arXiv Detail & Related papers (2020-05-04T17:41:59Z)
Firearm Detection and Segmentation Using an Ensemble of Semantic Neural Networks [62.997667081978825]
We present a weapon detection system based on an ensemble of semantic Convolutional Neural Networks. A set of simpler neural networks dedicated to specific tasks requires less computational resources and can be trained in parallel. The overall output of the system given by the aggregation of the outputs of individual networks can be tuned by a user to trade-off false positives and false negatives.
arXiv Detail & Related papers (2020-02-11T13:58:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.