Related papers: Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes

Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes

URL: http://arxiv.org/abs/2309.08588v1
Date: Fri, 15 Sep 2023 17:44:07 GMT
Title: Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Authors: Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, Erik Learned-Miller
Abstract summary: We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. We provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. This represents a strong new performance point for crowded scenes, an important setting for computer vision.
Score: 8.061773364318313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. Methods developed for wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video. On the other hand, methods used in autonomous driving (e.g., SLAM) leverage specific sensor setups, specific motion models, or local optimization strategies (lagging batch processing) and do not generalize well to handheld video. Finally, for dynamic scenes, commonly used robustification techniques like RANSAC require large numbers of iterations, and become prohibitively slow. We introduce a novel generalization of the Hough transform on SO(3) to efficiently and robustly find the camera rotation most compatible with optical flow. Among comparably fast methods, ours reduces error by almost 50\% over the next best, and is more accurate than any method, irrespective of speed. This represents a strong new performance point for crowded scenes, an important setting for computer vision. The code and the dataset are available at https://fabiendelattre.com/robust-rotation-estimation.

Related papers

Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video [22.760823792026056]
We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity. Our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-04-28T14:22:04Z)
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [52.726585508669686]
We propose AnyCam, a fast transformer model that directly estimates camera poses and intrinsics from a dynamic video sequence. We test AnyCam on established datasets, where it delivers accurate camera poses and intrinsics both qualitatively and quantitatively. By combining camera information, uncertainty, and depth, our model can produce high-quality 4D pointclouds.
arXiv Detail & Related papers (2025-03-30T02:22:11Z)
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image [14.485182089870928]
We propose a novel framework that leverages motion blur as a rich cue for motion estimation. Our approach works by predicting a dense motion flow field and a monocular depth map directly from a single motion-blurred image. Our method produces an IMU-like measurement that robustly captures fast and aggressive camera movements.
arXiv Detail & Related papers (2025-03-21T17:58:56Z)
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [104.1338295060383]
We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes. Our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work.
arXiv Detail & Related papers (2024-12-05T18:59:42Z)
Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization [11.418632671254564]
3D Gaussian Splatting has emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We show results on real-world scenes and complex trajectories through simulated environments.
arXiv Detail & Related papers (2024-10-11T12:01:15Z)
Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images. We apply a diversity-based sampling algorithm to optimize the camera selection. We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z)
Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting [14.759265492381509]
We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure. Results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
arXiv Detail & Related papers (2024-06-03T06:52:35Z)
U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments [18.534567960292403]
We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images. Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods.
arXiv Detail & Related papers (2024-03-22T19:14:28Z)
Joint 3D Shape and Motion Estimation from Rolling Shutter Light-Field Images [2.0277446818410994]
We propose an approach to address the problem of 3D reconstruction of scenes from a single image captured by a light-field camera equipped with a rolling shutter sensor. Our method leverages the 3D information cues present in the light-field and the motion information provided by the rolling shutter effect. We present a generic model for the imaging process of this sensor and a two-stage algorithm that minimizes the re-projection error.
arXiv Detail & Related papers (2023-11-02T15:08:18Z)
Tracking Everything Everywhere All at Once [111.00807055441028]
We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. We propose a complete and globally consistent motion representation, dubbed OmniMotion. Our approach outperforms prior state-of-the-art methods by a large margin both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-06-08T17:59:29Z)
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption. We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels. Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z)
Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications. This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates. The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z)
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow. A novel neural network architecture is proposed for processing irregular point trajectory data. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)
Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames. Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability. We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.