Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
- URL: http://arxiv.org/abs/2503.13739v1
- Date: Mon, 17 Mar 2025 21:48:56 GMT
- Title: Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
- Authors: Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy,
- Abstract summary: We propose a self-supervised uncalibrated multi-view person association approach, Self-MVA, without using any annotations.<n>Specifically, we propose a self-supervised learning framework, consisting of an encoder-decoder model and a self-supervised pretext task.<n>Our approach achieves state-of-the-art results, surpassing existing unsupervised and fully-supervised approaches.
- Score: 3.2416801263793285
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-view person association is a fundamental step towards multi-view analysis of human activities. Although the person re-identification features have been proven effective, they become unreliable in challenging scenes where persons share similar appearances. Therefore, cross-view geometric constraints are required for a more robust association. However, most existing approaches are either fully-supervised using ground-truth identity labels or require calibrated camera parameters that are hard to obtain. In this work, we investigate the potential of learning from synchronization, and propose a self-supervised uncalibrated multi-view person association approach, Self-MVA, without using any annotations. Specifically, we propose a self-supervised learning framework, consisting of an encoder-decoder model and a self-supervised pretext task, cross-view image synchronization, which aims to distinguish whether two images from different views are captured at the same time. The model encodes each person's unified geometric and appearance features, and we train it by utilizing synchronization labels for supervision after applying Hungarian matching to bridge the gap between instance-wise and image-wise distances. To further reduce the solution space, we propose two types of self-supervised linear constraints: multi-view re-projection and pairwise edge association. Extensive experiments on three challenging public benchmark datasets (WILDTRACK, MVOR, and SOLDIERS) show that our approach achieves state-of-the-art results, surpassing existing unsupervised and fully-supervised approaches. Code is available at https://github.com/CAMMA-public/Self-MVA.
Related papers
- Unveiling the Power of Self-supervision for Multi-view Multi-human
Association and Tracking [22.243799150495487]
Multi-view multi-human association and tracking (MvMHAT) is a new but important problem for multi-person scene video surveillance.
We tackle this problem with a self-supervised learning aware end-to-end network.
We build two new large-scale benchmarks for the network training and testing of different algorithms.
arXiv Detail & Related papers (2024-01-31T06:12:28Z) - DealMVC: Dual Contrastive Calibration for Multi-view Clustering [78.54355167448614]
We propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC)
We first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph.
During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels.
arXiv Detail & Related papers (2023-08-17T14:14:28Z) - HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised
Learning [19.432034725468217]
HaMuCo is a self-supervised learning framework that learns a single-view hand pose estimator from multi-view pseudo 2D labels.
We introduce a cross-view interaction network that distills the single-view estimator by utilizing the cross-view correlated features.
Our method can achieve state-of-the-art performance on multi-view self-supervised hand pose estimation.
arXiv Detail & Related papers (2023-02-02T10:13:04Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Unsupervised Person Re-identification via Simultaneous Clustering and
Consistency Learning [22.008371113710137]
We design a pretext task for unsupervised re-ID by learning visual consistency from still images and temporal consistency during training process.
We optimize the model by grouping the two encoded views into same cluster, thus enhancing the visual consistency between views.
arXiv Detail & Related papers (2021-04-01T02:10:42Z) - Unsupervised Person Re-Identification with Multi-Label Learning Guided
Self-Paced Clustering [48.31017226618255]
Unsupervised person re-identification (Re-ID) has drawn increasing research attention recently.
In this paper, we address the unsupervised person Re-ID with a conceptually novel yet simple framework, termed as Multi-label Learning guided self-paced Clustering (MLC)
MLC mainly learns discriminative features with three crucial modules, namely a multi-scale network, a multi-label learning module, and a self-paced clustering module.
arXiv Detail & Related papers (2021-03-08T07:30:13Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - CycAs: Self-supervised Cycle Association for Learning Re-identifiable
Descriptions [61.724894233252414]
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem.
Existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels.
arXiv Detail & Related papers (2020-07-15T09:52:35Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.