Related papers: Facial Expression Recognition at the Edge: CPU vs GPU vs VPU vs TPU

Facial Expression Recognition at the Edge: CPU vs GPU vs VPU vs TPU

URL: http://arxiv.org/abs/2305.15422v1
Date: Wed, 17 May 2023 03:19:06 GMT
Title: Facial Expression Recognition at the Edge: CPU vs GPU vs VPU vs TPU
Authors: Mohammadreza Mohammadi, Heath Smith, Lareb Khan, Ramtin Zand
Abstract summary: We present a hierarchical framework for developing and optimizing hardware-aware CNNs tuned for deployment at the edge. We achieved a peak accuracy of 99.49% when testing on the CK+ facial expression recognition dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial Expression Recognition (FER) plays an important role in human-computer interactions and is used in a wide range of applications. Convolutional Neural Networks (CNN) have shown promise in their ability to classify human facial expressions, however, large CNNs are not well-suited to be implemented on resource- and energy-constrained IoT devices. In this work, we present a hierarchical framework for developing and optimizing hardware-aware CNNs tuned for deployment at the edge. We perform a comprehensive analysis across various edge AI accelerators including NVIDIA Jetson Nano, Intel Neural Compute Stick, and Coral TPU. Using the proposed strategy, we achieved a peak accuracy of 99.49% when testing on the CK+ facial expression recognition dataset. Additionally, we achieved a minimum inference latency of 0.39 milliseconds and a minimum power consumption of 0.52 Watts.

Related papers

FPGA-based Acceleration of Neural Network for Image Classification using Vitis AI [0.0]
We accelerate a CNN for image classification with the CIFAR-10 dataset using Vitis-AI on Xilinx Zynq UltraScale+ MPSoC ZCU104 FPGA evaluation board. The work achieves 3.33-5.82x higher throughput and 3.39-6.30x higher energy efficiency than CPU and GPU baselines.
arXiv Detail & Related papers (2024-12-30T14:26:17Z)
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time [5.05866540830123]
We present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of Deep Neural Networks (DNNs) among various on-chip CUs. We show that ODiMO reduces the latency of a DNN executed on the Darkside by up to 8x at iso-accuracy, compared to a manual mappings. When targeting energy, ODiMO produced up to 50.8x more efficient mappings, with minimal accuracy drop.
arXiv Detail & Related papers (2024-09-27T09:10:44Z)
Realtime Facial Expression Recognition: Neuromorphic Hardware vs. Edge AI Accelerators [0.5492530316344587]
The paper focuses on real-time facial expression recognition (FER) systems as an important component in various real-world applications such as social robotics. We investigate two hardware options for the deployment of FER machine learning (ML) models at the edge: neuromorphic hardware versus edge AI accelerators.
arXiv Detail & Related papers (2024-01-30T16:12:20Z)
Optimizing Convolutional Neural Network Architecture [0.0]
Convolutional Neural Networks (CNN) are widely used to face challenging tasks like speech recognition, natural language processing or computer vision. We propose Optimizing Convolutional Neural Network Architecture (OCNNA), a novel CNN optimization and construction method based on pruning and knowledge distillation. Our method has been compared with more than 20 convolutional neural network simplification algorithms obtaining outstanding results.
arXiv Detail & Related papers (2023-12-17T12:23:11Z)
Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption [11.706881389387242]
Homomorphic encryption (FHE) is a viable approach for achieving private inference (PI) FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. We propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme.
arXiv Detail & Related papers (2023-10-25T10:24:35Z)
SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning [55.84746218227712]
We develop SqueezerFaceNet, a light face recognition network which less than 1M parameters. We show that it can be further reduced (up to 40%) without an appreciable loss in performance.
arXiv Detail & Related papers (2023-07-20T08:38:50Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive. We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading. We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Pixel Difference Networks for Efficient Edge Detection [71.03915957914532]
We propose a lightweight yet effective architecture named Pixel Difference Network (PiDiNet) for efficient edge detection. Extensive experiments on BSDS500, NYUD, and Multicue datasets are provided to demonstrate its effectiveness. A faster version of PiDiNet with less than 0.1M parameters can still achieve comparable performance among state of the arts with 200 FPS.
arXiv Detail & Related papers (2021-08-16T10:42:59Z)
Performance Prediction for Convolutional Neural Networks in Edge Devices [0.8602553195689513]
Convolutional Neural Network (CNN) based applications on edge devices near the source of data can meet the latency and privacy challenges. We present and compare five (5) of the widely used Machine Learning based methods for execution time prediction of CNNs on two (2) edge GPU platforms.
arXiv Detail & Related papers (2020-10-21T20:21:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.