Related papers: An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

URL: http://arxiv.org/abs/2102.10423v1
Date: Sat, 20 Feb 2021 19:25:09 GMT
Title: An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
Authors: Amir Yazdanbakhsh, Kiran Seshadri, Berkin Akin, James Laudon, Ravi Narayanaswami
Abstract summary: Edge TPUs are accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. We extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline. We present our efforts in developing high-accuracy learned machine learning models to estimate the major performance metrics of accelerators.
Score: 2.7584363116322863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks. Building upon this extensive study, we discuss critical and interpretable microarchitectural insights about the studied classes of Edge TPUs. Mainly, we discuss how Edge TPU accelerators perform across convolutional neural networks with different structures. Finally, we present our ongoing efforts in developing high-accuracy learned machine learning models to estimate the major performance metrics of accelerators such as latency and energy consumption. These learned models enable significantly faster (in the order of milliseconds) evaluations of accelerators as an alternative to time-consuming cycle-accurate simulators and establish an exciting opportunity for rapid hard-ware/software co-design.

Related papers

Edge Intelligence with Spiking Neural Networks [50.33340747216377]
Spiking Neural Networks (SNNs) offer low-power, event-driven computation on resource-constrained devices.<n>We present a systematic taxonomy of EdgeSNN foundations, encompassing neuron models, learning algorithms, and supporting hardware platforms.<n>Three representative practical considerations of EdgeSNN are discussed in depth: on-device inference using lightweight SNN models, resource-aware training and updating under non-stationary data conditions, and secure and privacy-preserving issues.
arXiv Detail & Related papers (2025-07-18T16:47:52Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
Exploration of TPUs for AI Applications [0.0]
Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI.
arXiv Detail & Related papers (2023-09-16T07:58:05Z)
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z)
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive. We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading. We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z)
Exploring Deep Neural Networks on Edge TPU [2.9573904824595614]
This paper explores the performance of Google's Edge TPU on feed forward neural networks. We compare the energy efficiency of Edge TPU with that of widely-used embedded CPU ARM Cortex-A53.
arXiv Detail & Related papers (2021-10-17T14:01:26Z)
Deep Learning on Edge TPUs [0.0]
I review the Edge TPU platform, the tasks that have been accomplished using the Edge TPU, and which steps are necessary to deploy a model to the Edge TPU hardware. The Edge TPU is not only capable of tackling common computer vision tasks, but also surpasses other hardware accelerators. Co-embedding the Edge TPU in cameras allows a seamless analysis of primary data.
arXiv Detail & Related papers (2021-08-31T10:23:37Z)
Exploring Edge TPU for Network Intrusion Detection in IoT [2.8873930745906957]
This paper explores Google's Edge TPU for implementing a practical network intrusion detection system (NIDS) at the edge of IoT, based on a deep learning approach. Various scaled model sizes of two major deep neural network architectures are used to investigate these three metrics. The performance of the Edge TPU-based implementation is compared with that of an energy efficient embedded CPU (ARM Cortex A53)
arXiv Detail & Related papers (2021-03-30T12:43:57Z)
Accelerator-aware Neural Network Design using AutoML [5.33024001730262]
We present a class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU. For the Edge TPU in Coral devices, these models enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers.
arXiv Detail & Related papers (2020-03-05T21:34:22Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.