A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds
- URL: http://arxiv.org/abs/2007.12831v3
- Date: Thu, 18 Feb 2021 07:00:06 GMT
- Title: A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds
- Authors: Yi Wang, Junhui Hou, Xinyu Hou, and Lap-Pui Chau
- Abstract summary: We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
- Score: 54.73161039445703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel self-training approach named Crowd-SDNet
that enables a typical object detector trained only with point-level
annotations (i.e., objects are labeled with points) to estimate both the center
points and sizes of crowded objects. Specifically, during training, we utilize
the available point annotations to supervise the estimation of the center
points of objects directly. Based on a locally-uniform distribution assumption,
we initialize pseudo object sizes from the point-level supervisory information,
which are then leveraged to guide the regression of object sizes via a
crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware
refinement scheme to continuously refine the initial pseudo object sizes such
that the ability of the detector is increasingly boosted to detect and count
objects in crowds simultaneously. Moreover, to address extremely crowded
scenes, we propose an effective decoding method to improve the detector's
representation ability. Experimental results on the WiderFace benchmark show
that our approach significantly outperforms state-of-the-art point-supervised
methods under both detection and counting tasks, i.e., our method improves the
average precision by more than 10% and reduces the counting error by 31.2%.
Besides, our method obtains the best results on the crowd counting and
localization datasets (i.e., ShanghaiTech and NWPU-Crowd) and vehicle counting
datasets (i.e., CARPK and PUCPR+) compared with state-of-the-art
counting-by-detection methods. The code will be publicly available at
https://github.com/WangyiNTU/Point-supervised-crowd-detection.
Related papers
- Dense Center-Direction Regression for Object Counting and Localization with Point Supervision [1.9526430269580954]
We propose a novel approach termed CeDiRNet for point-supervised learning.
It uses a dense regression of directions pointing towards the nearest object centers.
We show that it outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-26T17:49:27Z) - SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues.
Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects.
We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z) - Improving Online Lane Graph Extraction by Object-Lane Clustering [106.71926896061686]
We propose an architecture and loss formulation to improve the accuracy of local lane graph estimates.
The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers.
We show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods.
arXiv Detail & Related papers (2023-07-20T15:21:28Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings [17.04471874483516]
Existing approaches either compute dense keypoint embeddings in a single forward pass, or allocate their full capacity to a sparse set of points.
In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few.
Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding.
arXiv Detail & Related papers (2021-12-09T13:25:42Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z) - Learning Object Scale With Click Supervision for Object Detection [29.421113887739413]
We propose a simple yet effective method which incorporatesCNN visualization with click supervision to generate the pseudoground-truths.
These pseudo ground-truthscans be used to train a fully-supervised detector.
Experimental results on the PASCAL VOC2007 and VOC 2012 datasets show that the proposed methodcan obtain much higher accuracy for estimating the object scale.
arXiv Detail & Related papers (2020-02-20T03:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.