Related papers: FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction

FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction

URL: http://arxiv.org/abs/2105.01937v1
Date: Wed, 5 May 2021 09:08:12 GMT
Title: FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction
Authors: Brian Gordon, Sigal Raab, Guy Azov, Raja Giryes, Daniel Cohen-Or
Abstract summary: Multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. We introduce FLEX, an end-to-end parameter-free multi-view model. We demonstrate results on the Human3.6M and KTH Multi-view Football II datasets.
Score: 70.09086274139504
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. Such dependency becomes a hurdle once shifting to dynamic capture in uncontrolled settings. We introduce FLEX (Free muLti-view rEconstruXion), an end-to-end parameter-free multi-view model. FLEX is parameter-free in the sense that it does not require any camera parameters, neither intrinsic nor extrinsic. Our key idea is that the 3D angles between skeletal parts, as well as bone lengths, are invariant to the camera position. Hence, learning 3D rotations and bone lengths rather than locations allows predicting common values for all camera views. Our network takes multiple video streams, learns fused deep features through a novel multi-view fusion layer, and reconstructs a single consistent skeleton with temporally coherent joint rotations. We demonstrate quantitative and qualitative results on the Human3.6M and KTH Multi-view Football II datasets. We compare our model to state-of-the-art methods that are not parameter-free and show that in the absence of camera parameters, we outperform them by a large margin while obtaining comparable results when camera parameters are available. Code, trained models, video demonstration, and additional materials will be available on our project page.

Related papers

Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image. We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics. By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems [22.494866649536018]
Neural Radiance Fields (NeRF) use multi-view images for 3D scene representation, demonstrating remarkable performance. Most previous NeRF-based methods assume a unique camera and rarely consider multi-camera scenarios. We propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF.
arXiv Detail & Related papers (2023-09-14T16:40:44Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
Multi-task Learning for Camera Calibration [3.274290296343038]
We present a unique method for predicting intrinsic (principal point offset and focal length) and extrinsic (baseline, pitch, and translation) properties from a pair of images. By reconstructing the 3D points using a camera model neural network and then using the loss in reconstruction to obtain the camera specifications, this innovative camera projection loss (CPL) method allows us that the desired parameters should be estimated.
arXiv Detail & Related papers (2022-11-22T17:39:31Z)
Camera Calibration through Camera Projection Loss [4.36572039512405]
We propose a novel method to predict intrinsic (focal length and principal point offset) parameters using an image pair. Unlike existing methods, we proposed a new representation that incorporates camera model equations as a neural network in multi-task learning framework. Our proposed approach achieves better performance with respect to both deep learning-based and traditional methods on 7 out of 10 parameters evaluated.
arXiv Detail & Related papers (2021-10-07T14:03:10Z)
MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation [55.96577490779591]
Methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data. We show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
arXiv Detail & Related papers (2021-10-01T14:56:37Z)
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.