TeFlow: Enabling Multi-frame Supervision for Self-Supervised Feed-forward Scene Flow Estimation
- URL: http://arxiv.org/abs/2602.19053v1
- Date: Sun, 22 Feb 2026 05:50:16 GMT
- Title: TeFlow: Enabling Multi-frame Supervision for Self-Supervised Feed-forward Scene Flow Estimation
- Authors: Qingwen Zhang, Chenhan Jiang, Xiaomeng Zhu, Yunqi Miao, Yushan Zhang, Olov Andersson, Patric Jensfelt,
- Abstract summary: Multi-frame supervision has the potential to provide more stable guidance by incorporating motion cues from past frames.<n>We present TeFlow, enabling multi-frame supervision for feed-forward models by mining temporally consistent supervision.<n>Our method performs on par with leading optimization-based methods, yet speeds up 150 times.
- Score: 14.239684633948746
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised feed-forward methods for scene flow estimation offer real-time efficiency, but their supervision from two-frame point correspondences is unreliable and often breaks down under occlusions. Multi-frame supervision has the potential to provide more stable guidance by incorporating motion cues from past frames, yet naive extensions of two-frame objectives are ineffective because point correspondences vary abruptly across frames, producing inconsistent signals. In the paper, we present TeFlow, enabling multi-frame supervision for feed-forward models by mining temporally consistent supervision. TeFlow introduces a temporal ensembling strategy that forms reliable supervisory signals by aggregating the most temporally consistent motion cues from a candidate pool built across multiple frames. Extensive evaluations demonstrate that TeFlow establishes a new state-of-the-art for self-supervised feed-forward methods, achieving performance gains of up to 33\% on the challenging Argoverse 2 and nuScenes datasets. Our method performs on par with leading optimization-based methods, yet speeds up 150 times. The code is open-sourced at https://github.com/KTH-RPL/OpenSceneFlow along with trained model weights.
Related papers
- DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method [10.777409790795351]
We propose DeltaFlow ($Delta$Flow), a lightweight 3D framework that captures motion cues via a $Delta$ scheme.<n>$Delta$Flow achieves state-of-the-art performance with up to 22% lower error and $2times$ faster inference.
arXiv Detail & Related papers (2025-08-23T15:06:59Z) - Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach [32.91982063297922]
We propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack.<n>The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments.<n>Our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones.
arXiv Detail & Related papers (2025-05-19T09:37:23Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video.<n>Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one.<n>We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z) - Neural Eulerian Scene Flow Fields [59.57980592109722]
EulerFlow works out-of-the-box without tuning across multiple domains.
It exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons.
It outperforms all prior art on the Argoverse 2 2024 Scene Flow Challenge.
arXiv Detail & Related papers (2024-10-02T20:56:45Z) - StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video
Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation.
We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks.
StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z) - MFT: Long-Term Tracking of Every Pixel [0.36832029288386137]
Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking.
Method exploits optical flows estimated between consecutive frames.
Tracks densely orders of magnitude faster than state-of-the-art point-tracking methods.
arXiv Detail & Related papers (2023-05-22T13:02:46Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.