ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video
- URL: http://arxiv.org/abs/2409.12202v2
- Date: Mon, 14 Oct 2024 12:35:44 GMT
- Title: ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video
- Authors: Han Ling, Yinghui Sun, Quansen Sun, Yuhui Zheng,
- Abstract summary: This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize.
With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID)
On KITTI, ScaleFlow++ achieved the best monocular scene flow estimation performance, reducing SF-all from 6.21 to 5.79.
- Score: 26.01796507893086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Perceiving and understanding 3D motion is a core technology in fields such as autonomous driving, robots, and motion prediction. This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize. With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID). Most existing methods directly regress MID from two RGB frames or optical flow, resulting in inaccurate and unstable results. Our key insight is cross-scale matching, which extracts deep motion clues by matching objects in pairs of images at different scales. Unlike previous methods, ScaleFlow++ integrates optical flow and MID estimation into a unified architecture, estimating optical flow and MID end-to-end based on feature matching. Moreover, we also proposed modules such as global initialization network, global iterative optimizer, and hybrid training pipeline to integrate global motion information, reduce the number of iterations, and prevent overfitting during training. On KITTI, ScaleFlow++ achieved the best monocular scene flow estimation performance, reducing SF-all from 6.21 to 5.79. The evaluation of MID even surpasses RGBD-based methods. In addition, ScaleFlow++ has achieved stunning zero-shot generalization performance in both rigid and nonrigid scenes. Code is available at \url{https://github.com/HanLingsgjk/CSCV}.
Related papers
- Gravity-aligned Rotation Averaging with Circular Regression [53.81374943525774]
We introduce a principled approach that integrates gravity direction into the rotation averaging phase of global pipelines.
We achieve state-of-the-art accuracy on four large-scale datasets.
arXiv Detail & Related papers (2024-10-16T17:37:43Z) - ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video [15.629496237910999]
This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize.
With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID)
On KITTI, ScaleFlow++ achieved the best monocular scene flow estimation performance, reducing SF-all from 6.21 to 5.79.
arXiv Detail & Related papers (2024-07-13T07:58:48Z) - Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction [14.866463843514156]
Let Occ Flow is the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs.
Our approach incorporates a novel attention-based temporal fusion module to capture dynamic object dependencies.
Our method extends differentiable rendering to 3D volumetric flow fields.
arXiv Detail & Related papers (2024-07-10T12:20:11Z) - Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring [71.60457491155451]
Eliminating image blur produced by various kinds of motion has been a challenging problem.
We propose a novel real-world deblurring filtering model called the Motion-adaptive Separable Collaborative Filter.
Our method provides an effective solution for real-world motion blur removal and achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-04-19T19:44:24Z) - MemFlow: Optical Flow Estimation and Prediction with Memory [54.22820729477756]
We present MemFlow, a real-time method for optical flow estimation and prediction with memory.
Our method enables memory read-out and update modules for aggregating historical motion information in real-time.
Our approach seamlessly extends to the future prediction of optical flow based on past observations.
arXiv Detail & Related papers (2024-04-07T04:56:58Z) - RoHM: Robust Human Motion Reconstruction via Diffusion [58.63706638272891]
RoHM is an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos.
It conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates.
Our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time.
arXiv Detail & Related papers (2024-01-16T18:57:50Z) - Rethinking Optical Flow from Geometric Matching Consistent Perspective [38.014569953980754]
We propose a rethinking to previous optical flow estimation.
We use GIM as a pre-training task for the optical flow estimation (MatchFlow) with better feature representations.
Our method achieves 11.5% and 10.1% error reduction from GMA on Sintel clean pass and KITTI test set.
arXiv Detail & Related papers (2023-03-15T06:00:38Z) - What Matters for 3D Scene Flow Network [44.02710380584977]
3D scene flow estimation from point clouds is a low-level 3D motion perception task in computer vision.
We propose a novel all-to-all flow embedding layer with backward reliability validation during the initial scene flow estimation.
Our proposed model surpasses all existing methods by at least 38.2% on FlyingThings3D dataset and 24.7% on KITTI Scene Flow dataset for EPE3D metric.
arXiv Detail & Related papers (2022-07-19T09:27:05Z) - CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and
Scene Flow Estimation [15.98323974821097]
We study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data.
To address the problem, we propose a novel end-to-end framework, called CamLiFlow.
Our method ranks 1st on the KITTI Scene Flow benchmark, outperforming the previous art with 1/7 parameters.
arXiv Detail & Related papers (2021-11-20T02:58:38Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation [87.74617110803189]
Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision.
We present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions.
arXiv Detail & Related papers (2020-11-19T23:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.