Related papers: Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

URL: http://arxiv.org/abs/2007.13005v1
Date: Sat, 25 Jul 2020 20:26:05 GMT
Title: Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics
Authors: Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, Matei Zaharia
Abstract summary: In this work, we examine end-to-end DNN execution in visual analytics systems on modern accelerators. To address the bottleneck of preprocessing, we introduce two optimizations for end-to-end visual analytics systems. We show that its optimizations can achieve up to 5.9x end-to-end throughput improvements at a fixed accuracy over recent work in visual analytics.
Score: 24.62486707803304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While deep neural networks (DNNs) are an increasingly popular way to query large corpora of data, their significant runtime remains an active area of research. As a result, researchers have proposed systems and optimizations to reduce these costs by allowing users to trade off accuracy and speed. In this work, we examine end-to-end DNN execution in visual analytics systems on modern accelerators. Through a novel measurement study, we show that the preprocessing of data (e.g., decoding, resizing) can be the bottleneck in many visual analytics systems on modern hardware. To address the bottleneck of preprocessing, we introduce two optimizations for end-to-end visual analytics systems. First, we introduce novel methods of achieving accuracy and throughput trade-offs by using natively present, low-resolution visual data. Second, we develop a runtime engine for efficient visual DNN inference. This runtime engine a) efficiently pipelines preprocessing and DNN execution for inference, b) places preprocessing operations on the CPU or GPU in a hardware- and input-aware manner, and c) efficiently manages memory and threading for high throughput execution. We implement these optimizations in a novel system, Smol, and evaluate Smol on eight visual datasets. We show that its optimizations can achieve up to 5.9x end-to-end throughput improvements at a fixed accuracy over recent work in visual analytics.

Related papers

A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference [3.734578883171713]
We propose a data-driven framework for dataflow-aware latency prediction in GNN inference. Our regressors can predict the optimal dataflow for a given graph with up to 91.28% accuracy and a Mean Absolute Percentage Error (MAPE) of 3.78%. We introduce an online scheduling algorithm that uses these regressors to enhance scheduling decisions.
arXiv Detail & Related papers (2024-11-25T12:38:59Z)
Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision [6.287928298762318]
We evaluate computer vision inference requests performed on a throughput-optimized serving system. We quantify the performance impact of server overheads such as data movement, preprocessing, and message brokers. Our work identifies important performance bottlenecks in different application scenarios, achieves 2.25$times$ better throughput compared to prior work, and paves the way for more holistic deep learning system design.
arXiv Detail & Related papers (2024-03-02T02:35:08Z)
Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks [8.977436072381973]
We run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. We identify the potential causes, exercise a variety of optimization methods, and present their pros and cons.
arXiv Detail & Related papers (2023-04-18T11:57:38Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines. We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data. We show that it achieves no-regret under conditions analogous to GP-UCB. Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training [8.157520622932374]
profiling tools do not aim to answer predictive questions such as "How will optimization X affect the performance of my model?" We propose a new profiling tool, Daydream, to help programmers efficiently explore the efficacy of DNN optimizations. We show that Daydream is able to model most mainstream DNN optimization techniques, and accurately predict the efficacy of optimizations that will result in significant performance improvements.
arXiv Detail & Related papers (2020-06-05T09:08:16Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.