Related papers: SPLite Hand: Sparsity-Aware Lightweight 3D Hand Pose Estimation

SPLite Hand: Sparsity-Aware Lightweight 3D Hand Pose Estimation

URL: http://arxiv.org/abs/2510.16396v3
Date: Thu, 30 Oct 2025 04:59:32 GMT
Title: SPLite Hand: Sparsity-Aware Lightweight 3D Hand Pose Estimation
Authors: Yeh Keng Hao, Hsu Tzu Wei, Sun Min,
Abstract summary: We design a light framework that adopts an encoder-decoder architecture and introduces several key contributions aimed at improving both efficiency and accuracy.<n>We apply sparse convolution on a ResNet-18 backbone to exploit the inherent sparsity in hand pose images, achieving a 42% end-to-end efficiency improvement.<n>This new architecture significantly boosts the decoding process's frame rate by 3.1x on the Raspberry Pi 5, while maintaining accuracy on par.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increasing ubiquity of AR/VR devices, the deployment of deep learning models on edge devices has become a critical challenge. These devices require real-time inference, low power consumption, and minimal latency. Many framework designers face the conundrum of balancing efficiency and performance. We design a light framework that adopts an encoder-decoder architecture and introduces several key contributions aimed at improving both efficiency and accuracy. We apply sparse convolution on a ResNet-18 backbone to exploit the inherent sparsity in hand pose images, achieving a 42% end-to-end efficiency improvement. Moreover, we propose our SPLite decoder. This new architecture significantly boosts the decoding process's frame rate by 3.1x on the Raspberry Pi 5, while maintaining accuracy on par. To further optimize performance, we apply quantization-aware training, reducing memory usage while preserving accuracy (PA-MPJPE increases only marginally from 9.0 mm to 9.1 mm on FreiHAND). Overall, our system achieves a 2.98x speed-up on a Raspberry Pi 5 CPU (BCM2712 quad-core Arm A76 processor). Our method is also evaluated on compound benchmark datasets, demonstrating comparable accuracy to state-of-the-art approaches while significantly enhancing computational efficiency.

Related papers

Evolutionary Mapping of Neural Networks to Spatial Accelerators [64.13809409887254]
We introduce the first evolutionary, hardware-in-the-loop mapping framework for neuromorphic accelerators.<n>We evaluate our approach on Intel Loihi 2, a representative spatial accelerator featuring 152 cores in a 2D mesh.<n>Our method achieves up to 35% reduction in total latency compared to default cores on two sparse multi-layer perceptron networks.
arXiv Detail & Related papers (2026-02-04T16:28:08Z)
Lightweight Transformer Architectures for Edge Devices in Real-Time Applications [0.0]
This survey examines lightweight transformer architectures specifically designed for edge deployment.<n>We systematically review prominent lightweight variants including MobileBERT, TinyBERT, DistilBERT, EfficientFormer, EdgeFormer, and MobileViT.<n> Experimental results demonstrate that modern lightweight transformers can achieve 75-96% of full-model accuracy while reducing model size by 4-10x and inference latency by 3-9x.
arXiv Detail & Related papers (2026-01-05T01:04:25Z)
Implementation of high-efficiency, lightweight residual spiking neural network processor based on field-programmable gate arrays [0.49806798459446283]
This work presents an efficient residual SNN accelerator that combines algorithm and hardware co-design to optimize inference energy efficiency.<n>The proposed processor achieves a classification accuracy of 87.11% on the CIFAR-10 dataset, with an inference time of 3.98 ms per image and an energy efficiency of 183.5 FPS/W.
arXiv Detail & Related papers (2025-12-09T02:08:46Z)
PocketSR: The Super-Resolution Expert in Your Pocket Mobiles [69.26751136689533]
Real-world image super-resolution (RealSR) aims to enhance the visual quality of in-the-wild images, such as those captured by mobile phones.<n>Existing methods leveraging large generative models demonstrate impressive results, but the high computational cost and latency make them impractical for edge deployment.<n>We introduce PocketSR, an ultra-lightweight, single-step model that brings generative modeling capabilities to RealSR while maintaining high fidelity.
arXiv Detail & Related papers (2025-10-03T13:56:18Z)
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.45368843861917]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z)
APOLLO: SGD-like Memory, AdamW-level Performance [61.53444035835778]
Large language models (LLMs) are notoriously memory-intensive during training.<n>Various memory-efficient Scals have been proposed to reduce memory usage.<n>They face critical challenges: (i) costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial memory overhead to maintain competitive performance.
arXiv Detail & Related papers (2024-12-06T18:55:34Z)
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream. Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z)
Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime [57.5143536744084]
High performance of deep learning models comes at the expense of high computational, storage and power requirements. We introduce Deeplite Neutrino for production-ready optimization of the models and Deeplite for deployment of ultra-low bit quantized models on Arm-based platforms.
arXiv Detail & Related papers (2022-07-18T15:05:17Z)
PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices [13.62426382827205]
PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices. Models achieve better trade-offs between accuracy and latency compared to other popular models.
arXiv Detail & Related papers (2021-11-01T12:53:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.