Cross-View Cross-Scene Multi-View Crowd Counting
- URL: http://arxiv.org/abs/2205.01551v1
- Date: Tue, 3 May 2022 15:03:44 GMT
- Title: Cross-View Cross-Scene Multi-View Crowd Counting
- Authors: Qi Zhang, Wei Lin, Antoni B. Chan
- Abstract summary: Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
- Score: 56.83882084112913
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-view crowd counting has been previously proposed to utilize
multi-cameras to extend the field-of-view of a single camera, capturing more
people in the scene, and improve counting performance for occluded people or
those in low resolution. However, the current multi-view paradigm trains and
tests on the same single scene and camera-views, which limits its practical
application. In this paper, we propose a cross-view cross-scene (CVCS)
multi-view crowd counting paradigm, where the training and testing occur on
different scenes with arbitrary camera layouts. To dynamically handle the
challenge of optimal view fusion under scene and camera layout change and
non-correspondence noise due to camera calibration errors or erroneous
features, we propose a CVCS model that attentively selects and fuses multiple
views together using camera layout geometry, and a noise view regularization
method to train the model to handle non-correspondence errors. We also generate
a large synthetic multi-camera crowd counting dataset with a large number of
scenes and camera views to capture many possible variations, which avoids the
difficulty of collecting and annotating such a large real dataset. We then test
our trained CVCS model on real multi-view counting datasets, by using
unsupervised domain transfer. The proposed CVCS model trained on synthetic data
outperforms the same model trained only on real data, and achieves promising
performance compared to fully supervised methods that train and test on the
same single scene.
Related papers
- Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting [44.48514301889318]
This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach.
A large synthetic dataset is adopted to enhance the model's generalization ability.
Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance.
arXiv Detail & Related papers (2024-05-30T11:03:27Z) - Learning to Select Camera Views: Efficient Multiview Understanding at
Few Glances [59.34619548026885]
We propose a view selection approach that analyzes the target object or scenario from given views and selects the next best view for processing.
Our approach features a reinforcement learning based camera selection module, MVSelect, that not only selects views but also facilitates joint training with the task network.
arXiv Detail & Related papers (2023-03-10T18:59:10Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Wide-Baseline Multi-Camera Calibration using Person Re-Identification [27.965850489928457]
We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios.
Treating people in the scene as "keypoints" and associating them across different camera views can be an alternative method for obtaining correspondences.
Our method first employs a re-ID method to associate human bounding boxes across cameras, then converts bounding box correspondences to point correspondences.
arXiv Detail & Related papers (2021-04-17T15:09:18Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes [50.744452135300115]
We propose a deep neural network framework for multi-view crowd counting.
Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
arXiv Detail & Related papers (2020-12-02T03:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.