Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
- URL: http://arxiv.org/abs/2309.08588v1
- Date: Fri, 15 Sep 2023 17:44:07 GMT
- Title: Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
- Authors: Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael
J. Jones, Pedro Miraldo, Erik Learned-Miller
- Abstract summary: We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video.
We provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences.
This represents a strong new performance point for crowded scenes, an important setting for computer vision.
- Score: 8.061773364318313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach to estimating camera rotation in crowded, real-world
scenes from handheld monocular video. While camera rotation estimation is a
well-studied problem, no previous methods exhibit both high accuracy and
acceptable speed in this setting. Because the setting is not addressed well by
other datasets, we provide a new dataset and benchmark, with high-accuracy,
rigorously verified ground truth, on 17 video sequences. Methods developed for
wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video.
On the other hand, methods used in autonomous driving (e.g., SLAM) leverage
specific sensor setups, specific motion models, or local optimization
strategies (lagging batch processing) and do not generalize well to handheld
video. Finally, for dynamic scenes, commonly used robustification techniques
like RANSAC require large numbers of iterations, and become prohibitively slow.
We introduce a novel generalization of the Hough transform on SO(3) to
efficiently and robustly find the camera rotation most compatible with optical
flow. Among comparably fast methods, ours reduces error by almost 50\% over the
next best, and is more accurate than any method, irrespective of speed. This
represents a strong new performance point for crowded scenes, an important
setting for computer vision. The code and the dataset are available at
https://fabiendelattre.com/robust-rotation-estimation.
Related papers
- Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization [11.418632671254564]
3D Gaussian Splatting has emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images.
We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals.
We show results on real-world scenes and complex trajectories through simulated environments.
arXiv Detail & Related papers (2024-10-11T12:01:15Z) - Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting [14.759265492381509]
We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters.
It includes the extraction of 2D point features that robustly represent 3D structure.
Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
arXiv Detail & Related papers (2024-06-03T06:52:35Z) - U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments [18.534567960292403]
We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images.
Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods.
arXiv Detail & Related papers (2024-03-22T19:14:28Z) - Joint 3D Shape and Motion Estimation from Rolling Shutter Light-Field
Images [2.0277446818410994]
We propose an approach to address the problem of 3D reconstruction of scenes from a single image captured by a light-field camera equipped with a rolling shutter sensor.
Our method leverages the 3D information cues present in the light-field and the motion information provided by the rolling shutter effect.
We present a generic model for the imaging process of this sensor and a two-stage algorithm that minimizes the re-projection error.
arXiv Detail & Related papers (2023-11-02T15:08:18Z) - Tracking Everything Everywhere All at Once [111.00807055441028]
We present a new test-time optimization method for estimating dense and long-range motion from a video sequence.
We propose a complete and globally consistent motion representation, dubbed OmniMotion.
Our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-06-08T17:59:29Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability.
We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.