Wide-Baseline Multi-Camera Calibration using Person Re-Identification
- URL: http://arxiv.org/abs/2104.08568v1
- Date: Sat, 17 Apr 2021 15:09:18 GMT
- Title: Wide-Baseline Multi-Camera Calibration using Person Re-Identification
- Authors: Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani
- Abstract summary: We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios.
Treating people in the scene as "keypoints" and associating them across different camera views can be an alternative method for obtaining correspondences.
Our method first employs a re-ID method to associate human bounding boxes across cameras, then converts bounding box correspondences to point correspondences.
- Score: 27.965850489928457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of estimating the 3D pose of a network of cameras for
large-environment wide-baseline scenarios, e.g., cameras for construction
sites, sports stadiums, and public spaces. This task is challenging since
detecting and matching the same 3D keypoint observed from two very different
camera views is difficult, making standard structure-from-motion (SfM)
pipelines inapplicable. In such circumstances, treating people in the scene as
"keypoints" and associating them across different camera views can be an
alternative method for obtaining correspondences. Based on this intuition, we
propose a method that uses ideas from person re-identification (re-ID) for
wide-baseline camera calibration. Our method first employs a re-ID method to
associate human bounding boxes across cameras, then converts bounding box
correspondences to point correspondences, and finally solves for camera pose
using multi-view geometry and bundle adjustment. Since our method does not
require specialized calibration targets except for visible people, it applies
to situations where frequent calibration updates are required. We perform
extensive experiments on datasets captured from scenes of different sizes,
camera settings (indoor and outdoor), and human activities (walking, playing
basketball, construction). Experiment results show that our method achieves
similar performance to standard SfM methods relying on manually labeled point
correspondences.
Related papers
- Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - Multi-View Person Matching and 3D Pose Estimation with Arbitrary
Uncalibrated Camera Networks [36.49915280876899]
Cross-view person matching and 3D human pose estimation in multi-camera networks are difficult when the cameras are extrinsically uncalibrated.
Existing efforts require large amounts of 3D data for training neural networks or known camera poses for geometric constraints to solve the problem.
We present a method, PME, that solves the two tasks without requiring either information.
arXiv Detail & Related papers (2023-12-04T01:28:38Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Online Marker-free Extrinsic Camera Calibration using Person Keypoint
Detections [25.393382192511716]
We propose a marker-free online method for the extrinsic calibration of multiple smart edge sensors.
Our method assumes the intrinsic camera parameters to be known and requires priming with a rough initial estimate of the camera poses.
We show that the calibration with our method achieves lower reprojection errors compared to a reference calibration generated by an offline method.
arXiv Detail & Related papers (2022-09-15T15:54:21Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - MonoCInIS: Camera Independent Monocular 3D Object Detection using
Instance Segmentation [55.96577490779591]
Methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
We show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
arXiv Detail & Related papers (2021-10-01T14:56:37Z) - Cross-Camera Feature Prediction for Intra-Camera Supervised Person
Re-identification across Distant Scenes [70.30052164401178]
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views.
ICS-DS Re-ID uses cross-camera unpaired data with intra-camera identity labels for training.
Cross-camera feature prediction method to mine cross-camera self supervision information.
Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme.
arXiv Detail & Related papers (2021-07-29T11:27:50Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - Infrastructure-based Multi-Camera Calibration using Radial Projections [117.22654577367246]
Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually.
Infrastucture-based calibration techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM or Structure-from-Motion.
We propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.
arXiv Detail & Related papers (2020-07-30T09:21:04Z) - Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras [13.24490469380487]
We present an effective multi-view approach to end-to-end learning of precise manipulation tasks that are 3D in nature.
Our method learns to accomplish these tasks using multiple statically placed but uncalibrated RGB camera views without building an explicit 3D representation such as a pointcloud or voxel grid.
arXiv Detail & Related papers (2020-02-21T03:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.