SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR
- URL: http://arxiv.org/abs/2602.21699v1
- Date: Wed, 25 Feb 2026 09:03:42 GMT
- Title: SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR
- Authors: Rajai Alhimdiat, Ramy Battrawy, René Schuster, Didier Stricker, Wesam Ashour,
- Abstract summary: We present a deep learning architecture for sparse scene flow estimation using 2D monocular images and 3D point clouds.<n>Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together.<n>Experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets.
- Score: 17.224692757126153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.
Related papers
- RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS [79.15416002879239]
3D Gaussian Splatting has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling.<n>Existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images.<n>We propose RobustSplat, a robust solution based on two critical designs.
arXiv Detail & Related papers (2025-06-03T11:13:48Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter [1.856181262236876]
We present a two-stage method based on a 2D Gaussians Splatting that allows for a render and compare" strategy with a point cloud formulation.<n>In the first stage, a subset of sub-aperture images is used to estimate an RS 3D shape that is related to the scene target shape up to a motion"<n>In the second stage, the agnostic of the 3D shape is computed by estimating an admissible camera motion.
arXiv Detail & Related papers (2024-12-04T17:59:04Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
Flow and Scene Flow Estimation [43.358140897849616]
In this paper, we incorporate RGB images, Point clouds and Events for joint optical flow and scene flow estimation with our proposed multi-stage multimodal fusion model, RPEFlow.
Experiments on both synthetic and real datasets show that our model outperforms the existing state-of-the-art by a wide margin.
arXiv Detail & Related papers (2023-09-26T17:23:55Z) - SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow [25.577386156273256]
Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations.
We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision.
arXiv Detail & Related papers (2022-11-25T10:52:02Z) - RAFT-MSF: Self-Supervised Monocular Scene Flow using Recurrent Optimizer [21.125470798719967]
We introduce a self-supervised monocular scene flow method that substantially improves the accuracy over the previous approaches.
Based on RAFT, a state-of-the-art optical flow model, we design a new decoder to iteratively update 3D motion fields and disparity maps simultaneously.
Our method achieves state-of-the-art accuracy among all self-supervised monocular scene flow methods, improving accuracy by 34.2%.
arXiv Detail & Related papers (2022-05-03T15:43:57Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - FlowStep3D: Model Unrolling for Self-Supervised Scene Flow Estimation [87.74617110803189]
Estimating the 3D motion of points in a scene, known as scene flow, is a core problem in computer vision.
We present a recurrent architecture that learns a single step of an unrolled iterative alignment procedure for refining scene flow predictions.
arXiv Detail & Related papers (2020-11-19T23:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.