Related papers: Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models

Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models

URL: http://arxiv.org/abs/2512.16243v1
Date: Thu, 18 Dec 2025 06:49:55 GMT
Title: Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models
Authors: Qi Zhang, Yunfei Gong, Zhidan Xie, Zhizi Wang, Antoni B. Chan, Hui Huang,
Abstract summary: We propose two semi-supervised multi-view crowd counting frameworks.<n>We rank the multi-view fusion models of different numbers of input views.<n>Experiments demonstrate the advantages of the proposed multi-view model ranking methods.
Score: 46.12213690696149
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multi-view crowd counting has been proposed to deal with the severe occlusion issue of crowd counting in large and wide scenes. However, due to the difficulty of collecting and annotating multi-view images, the datasets for multi-view counting have a limited number of multi-view frames and scenes. To solve the problem of limited data, one approach is to collect synthetic data to bypass the annotating step, while another is to propose semi- or weakly-supervised or unsupervised methods that demand less multi-view data. In this paper, we propose two semi-supervised multi-view crowd counting frameworks by ranking the multi-view fusion models of different numbers of input views, in terms of the model predictions or the model uncertainties. Specifically, for the first method (vanilla model), we rank the multi-view fusion models' prediction results of different numbers of camera-view inputs, namely, the model's predictions with fewer camera views shall not be larger than the predictions with more camera views. For the second method, we rank the estimated model uncertainties of the multi-view fusion models with a variable number of view inputs, guided by the multi-view fusion models' prediction errors, namely, the model uncertainties with more camera views shall not be larger than those with fewer camera views. These constraints are introduced into the model training in a semi-supervised fashion for multi-view counting with limited labeled data. The experiments demonstrate the advantages of the proposed multi-view model ranking methods compared with other semi-supervised counting methods.

Related papers

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention [83.56588173102594]
We introduce a solution called mesh attention to enable training at 1024x1024 resolution.<n>This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency.<n>Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT.
arXiv Detail & Related papers (2025-03-11T17:50:59Z)
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos [66.1935609072708]
LangView is a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels.<n>During inference, our model takes as input only a multi-view video--no language or camera poses--and returns the best viewpoint to watch at each timestep.
arXiv Detail & Related papers (2024-11-13T16:31:08Z)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [71.24909962718128]
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation.<n>Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities.
arXiv Detail & Related papers (2024-08-22T16:32:32Z)
Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting [44.48514301889318]
This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach. A large synthetic dataset is adopted to enhance the model's generalization ability. Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance.
arXiv Detail & Related papers (2024-05-30T11:03:27Z)
Multi-View Conformal Learning for Heterogeneous Sensor Fusion [0.12086712057375555]
We build and test multi-view and single-view conformal models for heterogeneous sensor fusion. Our models provide theoretical marginal confidence guarantees since they are based on the conformal prediction framework. Our results also showed that multi-view models generate prediction sets with less uncertainty compared to single-view models.
arXiv Detail & Related papers (2024-02-19T17:30:09Z)
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs [48.269363759989915]
The research focuses on two aspects: first, image-to-image matching, and second, multi-image-to-text matching. We conduct evaluations on a range of both open-source and closed-source large models, including GPT-4V, Gemini, OpenFlamingo, and MMICL.
arXiv Detail & Related papers (2024-01-05T00:26:07Z)
Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera. We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z)
TSK Fuzzy System Towards Few Labeled Incomplete Multi-View Data Classification [24.01191516774655]
A transductive semi-supervised incomplete multi-view TSK fuzzy system modeling method (SSIMV_TSK) is proposed to address these challenges. The proposed method integrates missing view imputation, pseudo label learning of unlabeled data, and fuzzy system modeling into a single process to yield a model with interpretable fuzzy rules. Experimental results on real datasets show that the proposed method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-08T11:41:06Z)
Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in Large Scenes [50.744452135300115]
We propose a deep neural network framework for multi-view crowd counting. Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
arXiv Detail & Related papers (2020-12-02T03:20:30Z)
Generalized Multi-view Shared Subspace Learning using View Bootstrapping [43.027427742165095]
Key objective in multi-view learning is to model the information common to multiple parallel views of a class of objects/events to improve downstream learning tasks. We present a neural method based on multi-view correlation to capture the information shared across a large number of views by subsampling them in a view-agnostic manner during training. Experiments on spoken word recognition, 3D object classification and pose-invariant face recognition demonstrate the robustness of view bootstrapping to model a large number of views.
arXiv Detail & Related papers (2020-05-12T20:35:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.