Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration
- URL: http://arxiv.org/abs/2505.04524v1
- Date: Wed, 07 May 2025 15:57:53 GMT
- Title: Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration
- Authors: Asma Baobaid, Mahmoud Meribout,
- Abstract summary: This paper suggests a combined hardware-software approach to optimize face detection and recognition systems on NVIDIA Jetson AGX Orin.<n>The results suggest that simultaneous usage of all the hardware engines that are available in the Orin GPU and tracker integration into the pipeline yield an impressive throughput of 290 FPS (frames per second) on 1920 x 1080 input size frames containing in average of 6 faces/frame.<n>This hardware-codesign approach can pave the way to design high-performance machine vision systems at the edge, critically needed in video monitoring in public places.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cost-effective machine vision systems dedicated to real-time and accurate face detection and recognition in public places are crucial for many modern applications. However, despite their high performance, which could be reached using specialized edge or cloud AI hardware accelerators, there is still room for improvement in throughput and power consumption. This paper aims to suggest a combined hardware-software approach that optimizes face detection and recognition systems on one of the latest edge GPUs, namely NVIDIA Jetson AGX Orin. First, it leverages the simultaneous usage of all its hardware engines to improve processing time. This offers an improvement over previous works where these tasks were mainly allocated automatically and exclusively to the CPU or, to a higher extent, to the GPU core. Additionally, the paper suggests integrating a face tracker module to avoid redundantly running the face recognition algorithm for every frame but only when a new face appears in the scene. The results of extended experiments suggest that simultaneous usage of all the hardware engines that are available in the Orin GPU and tracker integration into the pipeline yield an impressive throughput of 290 FPS (frames per second) on 1920 x 1080 input size frames containing in average of 6 faces/frame. Additionally, a substantial saving of power consumption of around 800 mW was achieved when compared to running the task on the CPU/GPU engines only and without integrating a tracker into the Orin GPU\'92s pipeline. This hardware-codesign approach can pave the way to design high-performance machine vision systems at the edge, critically needed in video monitoring in public places where several nearby cameras are usually deployed for a same scene.
Related papers
- Faster than Fast: Accelerating Oriented FAST Feature Detection on Low-end Embedded GPUs [11.639825636679454]
This paper presents two methods to accelerate the Oriented FAST feature detection on low-end embedded GPU.<n>Experiments on a Jetson TX2 embedded GPU demonstrate an average speedup of over 7.3 times compared to widely used OpenCV with GPU support.
arXiv Detail & Related papers (2025-06-08T14:30:30Z) - Minute-Long Videos with Dual Parallelisms [57.22737565366549]
Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos.<n>We propose a novel distributed inference strategy, termed DualParal.<n>Instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs.
arXiv Detail & Related papers (2025-05-27T11:55:22Z) - Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition [0.0]
This paper aims to maximize the simultaneous usage of hardware engines available in edge GPUs.<n>It includes the video decoding task, which is required in most face monitoring applications.<n>Results suggest that simultaneously using all the hardware engines available in the recent NVIDIA edge Orin GPU, higher throughput, and a slight saving of power consumption of around 300 mW, accounting for around 5%.
arXiv Detail & Related papers (2025-05-07T15:22:17Z) - VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering [1.9470707535768061]
We implement the state-of-the-art radiance field method, 3D Gaussian splatting, using graphics APIs and evaluate it across synthetic and real-world scenes on today's graphics hardware.<n>We present VR-Pipe, which seamlessly integrates two innovations into graphics hardware to streamline the hardware pipeline for volume rendering.<n>Our evaluation shows that VR-Pipe greatly improves rendering performance, achieving up to a 2.78x speedup over the conventional graphics pipeline with negligible hardware overhead.
arXiv Detail & Related papers (2025-02-24T11:46:36Z) - Benchmarking Edge Computing Devices for Grape Bunches and Trunks
Detection using Accelerated Object Detection Single Shot MultiBox Deep
Learning Models [2.1922186455344796]
This work benchmarks the performance of different platforms for object detection in real-time.
Authors used the RetinaNet ResNet-50 fine-tuned using the natural Vine dataset.
arXiv Detail & Related papers (2022-11-21T17:02:33Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
Prediction [67.11722682878722]
This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention.
Our multi-scale linear attention achieves the global receptive field and multi-scale learning.
EfficientViT delivers remarkable performance gains over previous state-of-the-art models.
arXiv Detail & Related papers (2022-05-29T20:07:23Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - Efficient Visual Tracking via Hierarchical Cross-Attention Transformer [82.92565582642847]
We present an efficient tracking method via a hierarchical cross-attention transformer named HCAT.
Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier.
arXiv Detail & Related papers (2022-03-25T09:45:27Z) - High Performance Hyperspectral Image Classification using Graphics
Processing Units [0.0]
Real-time remote sensing applications require onboard real time processing capabilities.
Lightweight, small size and low power consumption hardware is essential for onboard real time processing systems.
arXiv Detail & Related papers (2021-05-30T09:26:03Z) - Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms.
Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.