Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting
- URL: http://arxiv.org/abs/2312.05923v1
- Date: Sun, 10 Dec 2023 16:12:13 GMT
- Title: Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting
- Authors: Xinyan Liu and Guorong Li and Yuankai Qi and Ziheng Yan and Zhenjun
Han and Anton van den Hengel and Ming-Hsuan Yang and Qingming Huang
- Abstract summary: Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
- Score: 126.75545291243142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video Individual Counting (VIC) aims to predict the number of unique
individuals in a single video. % Existing methods learn representations based
on trajectory labels for individuals, which are annotation-expensive. % To
provide a more realistic reflection of the underlying practical challenge, we
introduce a weakly supervised VIC task, wherein trajectory labels are not
provided. Instead, two types of labels are provided to indicate traffic
entering the field of view (inflow) and leaving the field view (outflow). % We
also propose the first solution as a baseline that formulates the task as a
weakly supervised contrastive learning problem under group-level matching. In
doing so, we devise an end-to-end trainable soft contrastive loss to drive the
network to distinguish inflow, outflow, and the remaining. % To facilitate
future study in this direction, we generate annotations from the existing VIC
datasets SenseCrowd and CroHD and also build a new dataset, UAVVIC. % Extensive
results show that our baseline weakly supervised method outperforms supervised
methods, and thus, little information is lost in the transition to the more
practically relevant weakly supervised task. The code and trained model will be
public at \href{https://github.com/streamer-AP/CGNet}{CGNet}
Related papers
- Label-Agnostic Forgetting: A Supervision-Free Unlearning in Deep Models [7.742594744641462]
Machine unlearning aims to remove information derived from forgotten data while preserving that of the remaining dataset in a well-trained model.
We propose a supervision-free unlearning approach that operates without the need for labels during the unlearning process.
arXiv Detail & Related papers (2024-03-31T00:29:00Z) - Weakly Supervised Video Anomaly Detection Based on Cross-Batch
Clustering Guidance [39.43891080713327]
Weakly supervised video anomaly detection (WSVAD) is a challenging task since only video-level labels are available for training.
We propose a novel WSVAD method based on cross-batch clustering guidance.
arXiv Detail & Related papers (2022-12-16T14:38:30Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
arXiv Detail & Related papers (2021-08-09T05:56:47Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - Evolving Losses for Unsupervised Video Representation Learning [91.2683362199263]
We present a new method to learn video representations from large-scale unlabeled video data.
The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods.
arXiv Detail & Related papers (2020-02-26T16:56:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.