Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
- URL: http://arxiv.org/abs/2505.21943v1
- Date: Wed, 28 May 2025 03:53:08 GMT
- Title: Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
- Authors: Wei Lin, Chenyang Zhao, Antoni B. Chan,
- Abstract summary: Point detection has been developed to locate pedestrians in crowded scenes by training a counter through a point-to-point (P2P) supervision scheme.<n>We integrate point-based methods into a semi-supervised counting framework based on pseudo-labeling.<n>During implementation, the confidence for pseudo-labels fails to be propagated to background pixels via the P2P.<n>We propose a point-to-region (P2R) scheme to substitute P2P, which segments out local regions rather than detects a point corresponding to a pedestrian for supervision.
- Score: 49.165960263166966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point detection has been developed to locate pedestrians in crowded scenes by training a counter through a point-to-point (P2P) supervision scheme. Despite its excellent localization and counting performance, training a point-based counter still faces challenges concerning annotation labor: hundreds to thousands of points are required to annotate a single sample capturing a dense crowd. In this paper, we integrate point-based methods into a semi-supervised counting framework based on pseudo-labeling, enabling the training of a counter with only a few annotated samples supplemented by a large volume of pseudo-labeled data. However, during implementation, the training encounters issues as the confidence for pseudo-labels fails to be propagated to background pixels via the P2P. To tackle this challenge, we devise a point-specific activation map (PSAM) to visually interpret the phenomena occurring during the ill-posed training. Observations from the PSAM suggest that the feature map is excessively activated by the loss for unlabeled data, causing the decoder to misinterpret these over-activations as pedestrians. To mitigate this issue, we propose a point-to-region (P2R) scheme to substitute P2P, which segments out local regions rather than detects a point corresponding to a pedestrian for supervision. Consequently, pixels in the local region can share the same confidence with the corresponding pseudo points. Experimental results in both semi-supervised counting and unsupervised domain adaptation highlight the advantages of our method, illustrating P2R can resolve issues identified in PSAM. The code is available at https://github.com/Elin24/P2RLoss.
Related papers
- Spatial regularisation for improved accuracy and interpretability in keypoint-based registration [5.286949071316761]
Recent approaches based on unsupervised keypoint detection stand out as very promising for interpretability.<n>Here, we propose a three-fold loss to regularise the spatial distribution of the features.<n>Our loss considerably improves the interpretability of the features, which now correspond to precise and anatomically meaningful landmarks.
arXiv Detail & Related papers (2025-03-06T14:48:25Z) - Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud
Registration [4.954184310509112]
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud.
Matching individual points with pixels can be inherently ambiguous due to modality gaps.
We propose a framework to capture quantity-aware correspondences between local point sets and pixel patches.
arXiv Detail & Related papers (2023-07-14T03:55:54Z) - Focus for Free in Density-Based Counting [56.961229110268036]
We introduce two methods that repurpose the available point annotations to enhance counting performance.
The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images.
The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds.
arXiv Detail & Related papers (2023-06-08T11:54:37Z) - Point-Teaching: Weakly Semi-Supervised Object Detection with Point
Annotations [81.02347863372364]
We present Point-Teaching, a weakly semi-supervised object detection framework.
Specifically, we propose a Hungarian-based point matching method to generate pseudo labels for point annotated images.
We propose a simple-yet-effective data augmentation, termed point-guided copy-paste, to reduce the impact of the unmatched points.
arXiv Detail & Related papers (2022-06-01T07:04:38Z) - Weakly-Supervised Salient Object Detection Using Point Supervison [17.88596733603456]
Current state-of-the-art saliency detection models rely heavily on large datasets of accurate pixel-wise annotations.
We propose a novel weakly-supervised salient object detection method using point supervision.
Our method outperforms the previous state-of-the-art methods trained with the stronger supervision.
arXiv Detail & Related papers (2022-03-22T12:16:05Z) - Object Localization under Single Coarse Point Supervision [107.46800858130658]
We propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points.
CPR constructs point bags, selects semantic-correlated points, and produces semantic center points through multiple instance learning (MIL)
In this way, CPR defines a weakly supervised evolution procedure, which ensures training high-performance object localizer under coarse point supervision.
arXiv Detail & Related papers (2022-03-17T14:14:11Z) - Rethinking Counting and Localization in Crowds:A Purely Point-Based
Framework [59.578339075658995]
We propose a purely point-based framework for joint crowd counting and individual localization.
We design an intuitive solution under this framework, which is called Point to Point Network (P2PNet)
arXiv Detail & Related papers (2021-07-27T11:41:50Z) - BRUL\`E: Barycenter-Regularized Unsupervised Landmark Extraction [2.2758845733923687]
Unsupervised retrieval of image features is vital for many computer vision tasks where the annotation is missing or scarce.
We propose a new unsupervised approach to detect the landmarks in images, validating it on the popular task of human face key-points extraction.
The method is based on the idea of auto-encoding the wanted landmarks in the latent space while discarding the non-essential information.
arXiv Detail & Related papers (2020-06-20T20:04:00Z) - Pose-guided Visible Part Matching for Occluded Person ReID [80.81748252960843]
We propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility.
Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-01T04:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.