Related papers: Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture

Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture

URL: http://arxiv.org/abs/2510.13250v1
Date: Wed, 15 Oct 2025 07:58:46 GMT
Title: Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture
Authors: Zhiyuan Zhao, Yubin Wen, Siyu Yang, Lichen Ning, Yuandong Liu, Junyu Gao,
Abstract summary: We design a super real-time model with a stem-encoder-decoder structure for crowd counting tasks.<n>The proposed network achieves 381.7 FPS on NVIDIA GTX 1080Ti and 71.9 FPS on NVIDIA Jetson TX1.
Score: 19.86251721232166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Crowd counting is a task of estimating the number of the crowd through images, which is extremely valuable in the fields of intelligent security, urban planning, public safety management, and so on. However, the existing counting methods have some problems in practical application on embedded systems for these fields, such as excessive model parameters, abundant complex calculations, etc. The practical application of embedded systems requires the model to be real-time, which means that the model is fast enough. Considering the aforementioned problems, we design a super real-time model with a stem-encoder-decoder structure for crowd counting tasks, which achieves the fastest inference compared with state-of-the-arts. Firstly, large convolution kernels in the stem network are used to enlarge the receptive field, which effectively extracts detailed head information. Then, in the encoder part, we use conditional channel weighting and multi-branch local fusion block to merge multi-scale features with low computational consumption. This part is crucial to the super real-time performance of the model. Finally, the feature pyramid networks are added to the top of the encoder to alleviate its incomplete fusion problems. Experiments on three benchmarks show that our network is suitable for super real-time crowd counting on embedded systems, ensuring competitive accuracy. At the same time, the proposed network reasoning speed is the fastest. Specifically, the proposed network achieves 381.7 FPS on NVIDIA GTX 1080Ti and 71.9 FPS on NVIDIA Jetson TX1.

Related papers

HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation [39.58684038370709]
LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots.<n>Previous state-of-the-art methods often face a trade-off between accuracy and speed.<n>We introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network.
arXiv Detail & Related papers (2025-10-08T10:46:07Z)
Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing [0.0]
Traditional sequential network approaches may struggle to meet the real-time knowledge and decision-making demands of an autonomous agent.<n>This paper proposes a novel baseline architecture for developing sophisticated models capable of true hardware-enabled parallelism.<n>The proposed model takes raw 3D point cloud data from the LiDAR sensor as input and converts it into a 2D Bird's Eye View Map on both devices.
arXiv Detail & Related papers (2024-12-24T04:56:32Z)
EasyNet: An Easy Network for 3D Industrial Anomaly Detection [49.26348455493123]
3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing. We propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks. Experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks.
arXiv Detail & Related papers (2023-07-26T02:46:50Z)
ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams. To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
A Real-Time Deep Network for Crowd Counting [12.615660025855604]
We propose a compact convolutional neural network for crowd counting. With three parallel filters executing the convolutional operation on the input image simultaneously at the front of the network, our model could achieve nearly real-time speed.
arXiv Detail & Related papers (2020-02-16T06:09:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.