Related papers: Self-Supervised Partial Cycle-Consistency for Multi-View Matching

Self-Supervised Partial Cycle-Consistency for Multi-View Matching

URL: http://arxiv.org/abs/2501.06000v1
Date: Fri, 10 Jan 2025 14:32:20 GMT
Title: Self-Supervised Partial Cycle-Consistency for Multi-View Matching
Authors: Fedor Taggenbrock, Gertjan Burghouts, Ronald Poppe,
Abstract summary: We train a view-invariant feature extraction network with cycle-consistency to handle partial overlap.<n>We present several new cycle variants that complement each other and present a time-divergent scene sampling scheme.<n>Compared to the self-supervised state-of-the-art, we achieve a 4.3 percentage point higher F1 score with our combined contributions.
Score: 5.984724082624813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Matching objects across partially overlapping camera views is crucial in multi-camera systems and requires a view-invariant feature extraction network. Training such a network with cycle-consistency circumvents the need for labor-intensive labeling. In this paper, we extend the mathematical formulation of cycle-consistency to handle partial overlap. We then introduce a pseudo-mask which directs the training loss to take partial overlap into account. We additionally present several new cycle variants that complement each other and present a time-divergent scene sampling scheme that improves the data input for this self-supervised setting. Cross-camera matching experiments on the challenging DIVOTrack dataset show the merits of our approach. Compared to the self-supervised state-of-the-art, we achieve a 4.3 percentage point higher F1 score with our combined contributions. Our improvements are robust to reduced overlap in the training data, with substantial improvements in challenging scenes that need to make few matches between many people. Self-supervised feature networks trained with our method are effective at matching objects in a range of multi-camera settings, providing opportunities for complex tasks like large-scale multi-camera scene understanding.

Related papers

MVUDA: Unsupervised Domain Adaptation for Multi-view Pedestrian Detection [4.506083131558207]
We address multi-view pedestrian detection in a setting where labeled data is collected using a multi-camera setup different from the one used for testing.<n>We propose an unsupervised domain adaptation (UDA) method that adapts the model to new rigs without requiring additional labeled data.
arXiv Detail & Related papers (2024-12-05T12:36:12Z)
One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
DVPE: Divided View Position Embedding for Multi-View 3D Object Detection [7.791229698270439]
Current research faces challenges in balancing between receptive fields and reducing interference when aggregating multi-view features. This paper proposes a divided view method, in which features are modeled globally via the visibility crossattention mechanism, but interact only with partial features in a divided local virtual space. Our framework, named DVPE, achieves state-of-the-art performance (57.2% mAP and 64.5% NDS) on the nuScenes test set.
arXiv Detail & Related papers (2024-07-24T02:44:41Z)
Reconstructing Close Human Interactions from Multiple Views [38.924950289788804]
This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. We introduce a novel system to address these challenges. Our system integrates a learning-based pose estimation component and its corresponding training and inference strategies.
arXiv Detail & Related papers (2024-01-29T14:08:02Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera. We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z)
A Unified Transformer Framework for Group-based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world. Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other. We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z)
Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences. We show that even when only trained with images, the learned feature representation is robust to instance appearance variations. In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.