Related papers: Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

URL: http://arxiv.org/abs/2407.07587v3
Date: Tue, 8 Oct 2024 11:07:08 GMT
Title: Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction
Authors: Yili Liu, Linzhan Mou, Xuan Yu, Chenrui Han, Sitong Mao, Rong Xiong, Yue Wang,
Abstract summary: Let Occ Flow is the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs. Our approach incorporates a novel attention-based temporal fusion module to capture dynamic object dependencies. Our method extends differentiable rendering to 3D volumetric flow fields.
Score: 14.866463843514156
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate perception of the dynamic environment is a fundamental task for autonomous driving and robot systems. This paper introduces Let Occ Flow, the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs, eliminating the need for 3D annotations. Utilizing TPV for unified scene representation and deformable attention layers for feature aggregation, our approach incorporates a novel attention-based temporal fusion module to capture dynamic object dependencies, followed by a 3D refine module for fine-gained volumetric representation. Besides, our method extends differentiable rendering to 3D volumetric flow fields, leveraging zero-shot 2D segmentation and optical flow cues for dynamic decomposition and motion optimization. Extensive experiments on nuScenes and KITTI datasets demonstrate the competitive performance of our approach over prior state-of-the-art methods. Our project page is available at https://eliliu2233.github.io/letoccflow/

Related papers

SelfOccFlow: Towards end-to-end self-supervised 3D Occupancy Flow prediction [2.012425476229879]
Estimating 3D occupancy and motion at the vehicle's surroundings is essential for autonomous driving.<n>Existing approaches jointly learn geometry and motion but rely on expensive 3D occupancy and flow annotations.<n>We propose a self-supervised method for 3D occupancy flow estimation that eliminates the need for human-produced annotations or external flow supervision.
arXiv Detail & Related papers (2026-02-27T10:42:01Z)
SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR [17.224692757126153]
We present a deep learning architecture for sparse scene flow estimation using 2D monocular images and 3D point clouds.<n>Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together.<n>Experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets.
arXiv Detail & Related papers (2026-02-25T09:03:42Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving [18.88208422580103]
Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. Current state-of-the-art methods require annotated data to train scene flow networks. We propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline.
arXiv Detail & Related papers (2024-07-01T18:22:54Z)
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)
Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes [19.810725397641406]
We propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly. Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation. Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior.
arXiv Detail & Related papers (2023-01-14T09:43:23Z)
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query. Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information. We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z)
Weakly Supervised Learning of Rigid 3D Scene Flow [81.37165332656612]
We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. We showcase the effectiveness and generalization capacity of our method on four different autonomous driving datasets.
arXiv Detail & Related papers (2021-02-17T18:58:02Z)
Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field. It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations. Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z)
Do not trust the neighbors! Adversarial Metric Learning for Self-Supervised Scene Flow Estimation [0.0]
Scene flow is the task of estimating 3D motion vectors to individual points of a dynamic 3D scene. We propose a 3D scene flow benchmark and a novel self-supervised setup for training flow models. We find that our setup is able to keep motion coherence and preserve local geometries, which many self-supervised baselines fail to grasp.
arXiv Detail & Related papers (2020-11-01T17:41:32Z)
Hierarchical Attention Learning of Scene Flow in 3D Point Clouds [28.59260783047209]
This paper studies the problem of scene flow estimation from two consecutive 3D point clouds. A novel hierarchical neural network with double attention is proposed for learning the correlation of point features in adjacent frames. Experiments show that the proposed network outperforms the state-of-the-art performance of 3D scene flow estimation.
arXiv Detail & Related papers (2020-10-12T14:56:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.