Related papers: Multi-Camera Multi-Person Association using Transformer-Based Dense Pixel Correspondence Estimation and Detection-Based Masking

Multi-Camera Multi-Person Association using Transformer-Based Dense Pixel Correspondence Estimation and Detection-Based Masking

URL: http://arxiv.org/abs/2408.09295v1
Date: Sat, 17 Aug 2024 20:58:16 GMT
Title: Multi-Camera Multi-Person Association using Transformer-Based Dense Pixel Correspondence Estimation and Detection-Based Masking
Authors: Daniel Kathein, Byron Hernandez, Henry Medeiros,
Abstract summary: Multi-camera Association (MCA) is the task of identifying objects and individuals across camera views. We investigate a novel multi-camera multi-target association algorithm based on dense pixel correspondence estimation. Our results conclude that the algorithm performs exceptionally well associating pedestrians on camera pairs that are positioned close to each other.
Score: 1.0937094979510213
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-camera Association (MCA) is the task of identifying objects and individuals across camera views and is an active research topic, given its numerous applications across robotics, surveillance, and agriculture. We investigate a novel multi-camera multi-target association algorithm based on dense pixel correspondence estimation with a Transformer-based architecture and underlying detection-based masking. After the algorithm generates a set of corresponding keypoints and their respective confidence levels between every pair of detections in the camera views are computed, an affinity matrix is determined containing the probabilities of matches between each pair. Finally, the Hungarian algorithm is applied to generate an optimal assignment matrix with all the predicted associations between the camera views. Our method is evaluated on the WILDTRACK Seven-Camera HD Dataset, a high-resolution dataset containing footage of walking pedestrians as well as precise annotations and camera calibrations. Our results conclude that the algorithm performs exceptionally well associating pedestrians on camera pairs that are positioned close to each other and observe the scene from similar perspectives. On camera pairs with orientations that are drastically different in distance or angle, there is still significant room for improvement.

Related papers

Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images. We apply a diversity-based sampling algorithm to optimize the camera selection. We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z)
Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment [22.531044994763487]
We propose a novel multi-camera multiple people tracking method that uses anchor clustering-guided for cross-camera reassigning. Our approach aims to improve accuracy of tracking by identifying key features that are unique to every individual. The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data.
arXiv Detail & Related papers (2023-04-19T07:38:15Z)
Tracking Passengers and Baggage Items using Multiple Overhead Cameras at Security Checkpoints [2.021502591596062]
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our results show that self-supervision improves object detection accuracy by up to $42%$ without increasing the inference time of the model.
arXiv Detail & Related papers (2022-12-31T12:57:09Z)
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation [2.861848675707602]
We present a new single-stage architecture called CASAPose. It determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects.
arXiv Detail & Related papers (2022-10-11T10:20:01Z)
Multi-Camera Collaborative Depth Prediction via Consistent Structure Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method. It does not require large overlapping areas while maintaining structure consistency between cameras. Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z)
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z)
Graph Neural Networks for Cross-Camera Data Association [3.490148531239259]
Cross-camera image data association is essential for many multi-camera computer vision tasks. This paper proposes an efficient approach for cross-cameras data-association focused on a global solution.
arXiv Detail & Related papers (2022-01-17T09:52:39Z)
Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes [70.30052164401178]
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views. ICS-DS Re-ID uses cross-camera unpaired data with intra-camera identity labels for training. Cross-camera feature prediction method to mine cross-camera self supervision information. Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme.
arXiv Detail & Related papers (2021-07-29T11:27:50Z)
DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud. Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z)
Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views. We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation [2.9972063833424216]
We present a dataset of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera.
arXiv Detail & Related papers (2020-04-24T11:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.