DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method
- URL: http://arxiv.org/abs/2508.17054v2
- Date: Fri, 24 Oct 2025 09:47:47 GMT
- Title: DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method
- Authors: Qingwen Zhang, Xiaomeng Zhu, Yushan Zhang, Yixi Cai, Olov Andersson, Patric Jensfelt,
- Abstract summary: We propose DeltaFlow ($Delta$Flow), a lightweight 3D framework that captures motion cues via a $Delta$ scheme.<n>$Delta$Flow achieves state-of-the-art performance with up to 22% lower error and $2times$ faster inference.
- Score: 10.777409790795351
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($\Delta$Flow), a lightweight 3D framework that captures motion cues via a $\Delta$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $\Delta$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
Related papers
- TeFlow: Enabling Multi-frame Supervision for Self-Supervised Feed-forward Scene Flow Estimation [14.239684633948746]
Multi-frame supervision has the potential to provide more stable guidance by incorporating motion cues from past frames.<n>We present TeFlow, enabling multi-frame supervision for feed-forward models by mining temporally consistent supervision.<n>Our method performs on par with leading optimization-based methods, yet speeds up 150 times.
arXiv Detail & Related papers (2026-02-22T05:50:16Z) - AlphaFlow: Understanding and Improving MeanFlow Models [74.64465762009475]
We show that the MeanFlow objective naturally decomposes into two parts: trajectory flow matching and trajectory consistency.<n>Motivated by these insights, we introduce $alpha$-Flow, a broad family of objectives that unifies trajectory flow matching, Shortcut Model, and MeanFlow.<n>When trained from scratch on class-conditional ImageNet-1K 256x256 with vanilla DiT backbones, $alpha$-Flow consistently outperforms MeanFlow across scales and settings.
arXiv Detail & Related papers (2025-10-23T17:45:06Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation [1.429392440481971]
Two types of approaches to the problem have evolved: 1) Supervised and 2) optimization-based methods.<n>Floxels is surpassed only by EulerFlow among unsupervised methods while achieving comparable performance at a fraction of the computational cost.<n>Floxels achieves a massive speedup of more than 60 - 140x over EulerFlow, reducing the runtime from a day to 10 minutes per sequence.
arXiv Detail & Related papers (2025-03-06T18:58:45Z) - Neural Eulerian Scene Flow Fields [59.57980592109722]
EulerFlow works out-of-the-box without tuning across multiple domains.
It exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons.
It outperforms all prior art on the Argoverse 2 2024 Scene Flow Challenge.
arXiv Detail & Related papers (2024-10-02T20:56:45Z) - MemFlow: Optical Flow Estimation and Prediction with Memory [54.22820729477756]
We present MemFlow, a real-time method for optical flow estimation and prediction with memory.
Our method enables memory read-out and update modules for aggregating historical motion information in real-time.
Our approach seamlessly extends to the future prediction of optical flow based on past observations.
arXiv Detail & Related papers (2024-04-07T04:56:58Z) - DeFlow: Decoder of Scene Flow Network in Autonomous Driving [19.486167661795797]
Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene.
Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running.
Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement.
arXiv Detail & Related papers (2024-01-29T12:47:55Z) - StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video
Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation.
We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks.
StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z) - ZeroFlow: Scalable Scene Flow via Distillation [66.70820145266029]
Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds.
State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process full-size point clouds.
We propose Scene Flow via Distillation, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feedforward model.
arXiv Detail & Related papers (2023-05-17T17:56:59Z) - PointFlowHop: Green and Interpretable Scene Flow Estimation from
Consecutive Point Clouds [49.7285297470392]
An efficient 3D scene flow estimation method called PointFlowHop is proposed in this work.
PointFlowHop takes two consecutive point clouds and determines the 3D flow vectors for every point in the first point cloud.
It decomposes the scene flow estimation task into a set of subtasks, including ego-motion compensation, object association and object-wise motion estimation.
arXiv Detail & Related papers (2023-02-27T23:06:01Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.