Counting People by Estimating People Flows
- URL: http://arxiv.org/abs/2012.00452v1
- Date: Tue, 1 Dec 2020 12:59:24 GMT
- Title: Counting People by Estimating People Flows
- Authors: Weizhe Liu, Mathieu Salzmann, Pascal Fua
- Abstract summary: We advocate estimating people flows across image locations between consecutive images instead of directly regressing them.
It significantly boosts performance without requiring a more complex architecture.
We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model.
- Score: 135.85747920798897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern methods for counting people in crowded scenes rely on deep networks to
estimate people densities in individual images. As such, only very few take
advantage of temporal consistency in video sequences, and those that do only
impose weak smoothness constraints across consecutive frames. In this paper, we
advocate estimating people flows across image locations between consecutive
images and inferring the people densities from these flows instead of directly
regressing them. This enables us to impose much stronger constraints encoding
the conservation of the number of people. As a result, it significantly boosts
performance without requiring a more complex architecture. Furthermore, it
allows us to exploit the correlation between people flow and optical flow to
further improve the results. We also show that leveraging people conservation
constraints in both a spatial and temporal manner makes it possible to train a
deep crowd counting model in an active learning setting with much fewer
annotations. This significantly reduces the annotation cost while still leading
to similar performance to the full supervision case.
Related papers
- Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - DistractFlow: Improving Optical Flow Estimation via Realistic
Distractions and Pseudo-Labeling [49.46842536813477]
We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models.
We combine one of the frames in the pair with a distractor image depicting a similar domain, which allows for inducing visual perturbations congruent with natural objects and scenes.
Our approach allows increasing the number of available training pairs significantly without requiring additional annotations.
arXiv Detail & Related papers (2023-03-24T15:42:54Z) - A Spatio-Temporal Attentive Network for Video-Based Crowd Counting [5.556665316806146]
Current computer vision techniques rely on deep learning-based algorithms that estimate pedestrian densities in still, individual images.
By taking advantage of temporal state-of-the-art correlation between consecutive frames, we lowered the attentive-temporal count by 5% and localization error by 7.5% on the widely-used FDST benchmark.
arXiv Detail & Related papers (2022-08-24T07:40:34Z) - Contrastive Language-Action Pre-training for Temporal Localization [64.34349213254312]
Long-form video understanding requires approaches that are able to temporally localize activities or language.
These limitations can be addressed by pre-training on large datasets of temporally trimmed videos supervised by class annotations.
We introduce a masked contrastive learning loss to capture visio-linguistic relations between activities, background video clips and language in the form of captions.
arXiv Detail & Related papers (2022-04-26T13:17:50Z) - CrowdFormer: Weakly-supervised Crowd counting with Improved
Generalizability [2.8174125805742416]
We propose a weakly-supervised method for crowd counting using a pyramid vision transformer.
Our method is comparable to the state-of-the-art on the benchmark crowd datasets.
arXiv Detail & Related papers (2022-03-07T23:10:40Z) - Leveraging Self-Supervision for Cross-Domain Crowd Counting [71.75102529797549]
State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density.
We train our network to recognize upside-down real images from regular ones and incorporate into it the ability to predict its own uncertainty.
This yields an algorithm that consistently outperforms state-of-the-art cross-domain crowd counting ones without any extra computation at inference time.
arXiv Detail & Related papers (2021-03-30T12:37:55Z) - Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting.
The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count.
Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.