Ultra-Efficient On-Device Object Detection on AI-Integrated Smart
Glasses with TinyissimoYOLO
- URL: http://arxiv.org/abs/2311.01057v2
- Date: Fri, 3 Nov 2023 15:25:55 GMT
- Title: Ultra-Efficient On-Device Object Detection on AI-Integrated Smart
Glasses with TinyissimoYOLO
- Authors: Julian Moosmann, Pietro Bonazzi, Yawei Li, Sizhen Bian, Philipp Mayer,
Luca Benini, Michele Magno
- Abstract summary: This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors.
We explore the energy- and latency-efficient of smart glasses in the case of real-time object detection.
Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's 17ms inference latency and 1.59mJ energy consumption per inference.
- Score: 20.11222151005929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smart glasses are rapidly gaining advanced functionality thanks to
cutting-edge computing technologies, accelerated hardware architectures, and
tiny AI algorithms. Integrating AI into smart glasses featuring a small form
factor and limited battery capacity is still challenging when targeting
full-day usage for a satisfactory user experience. This paper illustrates the
design and implementation of tiny machine-learning algorithms exploiting novel
low-power processors to enable prolonged continuous operation in smart glasses.
We explore the energy- and latency-efficient of smart glasses in the case of
real-time object detection. To this goal, we designed a smart glasses prototype
as a research platform featuring two microcontrollers, including a novel
milliwatt-power RISC-V parallel processor with a hardware accelerator for
visual AI, and a Bluetooth low-power module for communication. The smart
glasses integrate power cycling mechanisms, including image and audio sensing
interfaces. Furthermore, we developed a family of novel tiny deep-learning
models based on YOLO with sub-million parameters customized for
microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming
at benchmarking object detection with smart glasses for energy and latency.
Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's
17ms inference latency and 1.59mJ energy consumption per inference while
ensuring acceptable detection accuracy. Further evaluation reveals an
end-to-end latency from image capturing to the algorithm's prediction of 56ms
or equivalently 18 fps, with a total power consumption of 62.9mW, equivalent to
a 9.3 hours of continuous run time on a 154mAh battery. These results
outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image
classification) at just 7.3 fps per second.
Related papers
- Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices [0.0]
We evaluate state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet)
We deployed these models on popular edge devices like the Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson Orin Nano, collecting key performance metrics such as energy consumption, inference time, and Mean Average Precision (mAP)
Our findings highlight that lower mAP models such as SSD MobileNet V1 are more energy-efficient and faster in
arXiv Detail & Related papers (2024-09-25T10:56:49Z) - PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search [64.28335667655129]
Multiple object tracking is a critical task in autonomous driving.
As tracking accuracy improves, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency.
In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy.
arXiv Detail & Related papers (2024-03-23T04:18:49Z) - Lightweight Object Detection: A Study Based on YOLOv7 Integrated with
ShuffleNetv2 and Vision Transformer [0.0]
This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms.
The experimental outcomes reveal that the refined YOLO model demonstrates exceptional performance.
arXiv Detail & Related papers (2024-03-04T05:29:32Z) - SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on
FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications.
We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion.
We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z) - Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream.
Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - PhiNets: a scalable backbone for low-power AI at the edge [2.7910505923792646]
We present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms.
PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory.
We demonstrate our approach on a prototype node based on a STM32H743 microcontroller.
arXiv Detail & Related papers (2021-10-01T12:03:25Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - A Real-time Low-cost Artificial Intelligence System for Autonomous
Spraying in Palm Plantations [1.6799377888527687]
In precision crop protection, (target-orientated) object detection in image processing can help navigate Unmanned Aerial Vehicles (UAV, crop protection drones) to the right place to apply the pesticide.
We propose a solution based on a light deep neural network (DNN), called Ag-YOLO, which can make the crop protection UAV have the ability to target detection and autonomous operation.
arXiv Detail & Related papers (2021-03-06T15:05:14Z) - An Ultra Fast Low Power Convolutional Neural Network Image Sensor with
Pixel-level Computing [3.41234610095684]
This paper proposes a Processing-In-Pixel (PIP) CMOS sensor architecture, which allows convolution operation before the column readout circuit to significantly improve the image reading speed.
In other words, the computational efficiency is 4.75 TOPS/w, which is about 3.6 times as higher as the state-of-the-art.
arXiv Detail & Related papers (2021-01-09T07:10:03Z) - Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device [53.323878851563414]
We propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques.
Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically.
The proposed framework achieves real-time 3D object detection on mobile devices with competitive detection performance.
arXiv Detail & Related papers (2020-12-26T19:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.