Application-Driven AI Paradigm for Person Counting in Various Scenarios
- URL: http://arxiv.org/abs/2303.13788v1
- Date: Fri, 24 Mar 2023 03:57:21 GMT
- Title: Application-Driven AI Paradigm for Person Counting in Various Scenarios
- Authors: Minjie Hua, Yibing Nan, Shiguo Lian
- Abstract summary: We propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame.
We present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd.
- Score: 2.2881898195409884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person counting is considered as a fundamental task in video surveillance.
However, the scenario diversity in practical applications makes it difficult to
exploit a single person counting model for general use. Consequently, engineers
must preview the video stream and manually specify an appropriate person
counting model based on the scenario of camera shot, which is time-consuming,
especially for large-scale deployments. In this paper, we propose a person
counting paradigm that utilizes a scenario classifier to automatically select a
suitable person counting model for each captured frame. First, the input image
is passed through the scenario classifier to obtain a scenario label, which is
then used to allocate the frame to one of five fine-tuned models for person
counting. Additionally, we present five augmentation datasets collected from
different scenarios, including side-view, long-shot, top-view, customized and
crowd, which are also integrated to form a scenario classification dataset
containing 26323 samples. In our comparative experiments, the proposed paradigm
achieves better balance than any single model on the integrated dataset, thus
its generalization in various scenarios has been proved.
Related papers
- Localizing Events in Videos with Multimodal Queries [71.40602125623668]
We introduce a new benchmark, ICQ, for localizing events in videos with multimodal queries.
We include 4 styles of reference images and 5 types of refinement texts, allowing us to explore model performance across different domains.
arXiv Detail & Related papers (2024-06-14T14:35:58Z) - Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential
Recommendations [50.03560306423678]
We propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems.
Ada-Retrieval iteratively refines user representations to better capture potential candidates in the full item space.
arXiv Detail & Related papers (2024-01-12T15:26:40Z) - A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark.
Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions.
We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z) - PAMI: partition input and aggregate outputs for model interpretation [69.42924964776766]
In this study, a simple yet effective visualization framework called PAMI is proposed based on the observation that deep learning models often aggregate features from local regions for model predictions.
The basic idea is to mask majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction.
Extensive experiments on multiple tasks confirm the proposed method performs better than existing visualization approaches in more precisely finding class-specific input regions.
arXiv Detail & Related papers (2023-02-07T08:48:34Z) - CounTR: Transformer-based Generalised Visual Counting [94.54725247039441]
We develop a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars"
We conduct thorough ablation studies on the large-scale counting benchmark, e.g. FSC-147, and demonstrate state-of-the-art performance on both zero and few-shot settings.
arXiv Detail & Related papers (2022-08-29T17:02:45Z) - Scenario-Adaptive and Self-Supervised Model for Multi-Scenario
Personalized Recommendation [35.4495536683099]
We propose a scenario-Adaptive and Self-Supervised (SASS) model to solve the three challenges mentioned above.
The model is created symmetrically both in user side and item side, so that we can get distinguishing representations of items in different scenarios.
This model also achieves more than 8.0% improvement on Average Watching Time Per User in online A/B tests.
arXiv Detail & Related papers (2022-08-24T11:44:00Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling [25.525371500391568]
In certain applications, such as image retrieval platforms and photo album management apps, it is often required to execute a collection of models to obtain sufficient labels.
We propose an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of untrivial models by mining semantic relationship among diverse models, and 2) two algorithms to adaptively schedule the model execution order under a deadline or deadline-memory constraints respectively.
Our design could save around 53% execution time without loss of any valuable labels.
arXiv Detail & Related papers (2020-02-08T03:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.