A Real-Time Deep Network for Crowd Counting
- URL: http://arxiv.org/abs/2002.06515v1
- Date: Sun, 16 Feb 2020 06:09:22 GMT
- Title: A Real-Time Deep Network for Crowd Counting
- Authors: Xiaowen Shi, Xin Li, Caili Wu, Shuchen Kong, Jing Yang, Liang He
- Abstract summary: We propose a compact convolutional neural network for crowd counting.
With three parallel filters executing the convolutional operation on the input image simultaneously at the front of the network, our model could achieve nearly real-time speed.
- Score: 12.615660025855604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic analysis of highly crowded people has attracted extensive attention
from computer vision research. Previous approaches for crowd counting have
already achieved promising performance across various benchmarks. However, to
deal with the real situation, we hope the model run as fast as possible while
keeping accuracy. In this paper, we propose a compact convolutional neural
network for crowd counting which learns a more efficient model with a small
number of parameters. With three parallel filters executing the convolutional
operation on the input image simultaneously at the front of the network, our
model could achieve nearly real-time speed and save more computing resources.
Experiments on two benchmarks show that our proposed method not only takes a
balance between performance and efficiency which is more suitable for actual
scenes but also is superior to existing light-weight models in speed.
Related papers
- ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models [21.645510959114326]
A prevalent solution is a dual-system architecture, employing a small model for rapid, reactive decisions and a larger model for slower but more informative analyses.<n>Existing dual-system designs often implement parallel architectures where inference is either directly conducted using the large model at each current frame or retrieved from previously stored inference results.<n>Our key insight is to shift intensive computations of the current frame to previous time steps and perform a batch inference of multiple time steps to make large models respond promptly to each time step.<n>ETA advances state-of-the-art performance by 8% with a driving score of 69.53 while maintaining a near-real
arXiv Detail & Related papers (2025-06-09T13:11:02Z) - A Stable Whitening Optimizer for Efficient Neural Network Training [101.89246340672246]
Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method.<n>First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods.<n>Second, we adapt a shape-aware scaling to enable learning rate transfer across network width.<n>Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning.
arXiv Detail & Related papers (2025-06-08T18:43:31Z) - An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions [3.798710743290466]
We introduce a simple and very efficient 3D convolutional neural network for video action recognition.
We evaluate the performance and efficiency of our proposed network on several video action recognition datasets.
arXiv Detail & Related papers (2025-03-02T08:47:06Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - The Sparsity Roofline: Understanding the Hardware Limits of Sparse
Neural Networks [4.130528857196844]
We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks.
We show how machine learning researchers can predict the performance of unimplemented or unoptimized block-structured sparsity patterns.
We show how hardware designers can predict the performance implications of new sparsity patterns and sparse data formats in hardware.
arXiv Detail & Related papers (2023-09-30T21:29:31Z) - Rethinking Pareto Frontier for Performance Evaluation of Deep Neural
Networks [2.167843405313757]
We re-define the efficiency measure using a multi-objective optimization.
We combine competing variables with nature simultaneously in a single relative efficiency measure.
This allows to rank deep models that run efficiently on different computing hardware, and combines inference efficiency with training efficiency objectively.
arXiv Detail & Related papers (2022-02-18T15:58:17Z) - Real-time Human Detection Model for Edge Devices [0.0]
Convolutional Neural Networks (CNNs) have replaced traditional feature extraction and machine learning models in detection and classification tasks.
Lightweight CNN models have been recently introduced for real-time tasks.
This paper suggests a CNN-based lightweight model that can fit on a limited edge device such as Raspberry Pi.
arXiv Detail & Related papers (2021-11-20T18:42:17Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - A Compact Deep Architecture for Real-time Saliency Prediction [42.58396452892243]
Saliency models aim to imitate the attention mechanism in the human visual system.
Deep models have a high number of parameters which makes them less suitable for real-time applications.
Here we propose a compact yet fast model for real-time saliency prediction.
arXiv Detail & Related papers (2020-08-30T17:47:16Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.