SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow
- URL: http://arxiv.org/abs/2408.07825v1
- Date: Wed, 31 Jul 2024 02:28:40 GMT
- Title: SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow
- Authors: Zhiyang Lu, Qinghan Chen, Zhimin Yuan, Ming Cheng,
- Abstract summary: Scene flow provides the 3D motion field of the first frame from two consecutive point clouds.
We propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based semantic contexts.
We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world.
- Score: 6.995663556921384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic relationship of another frame. To address this issue, we propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based on semantic contexts. This is then integrated into Global Fusion Flow Embedding (GF) to initialize flow embedding based on global correlations in both contextual and Euclidean spaces. Secondly, deformations exist in non-rigid objects after the warping layer, which distorts the spatiotemporal relation between the consecutive frames. For a more precise estimation of residual flow at next-level, the Spatial Temporal Re-embedding (STR) module is devised to update the point sequence features at current-level. Lastly, poor generalization is often observed due to the significant domain gap between synthetic and LiDAR-scanned datasets. We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world. Experiments demonstrate that our approach achieves state-of-the-art (SOTA) performance across various datasets, with particularly outstanding results in real-world LiDAR-scanned situations. Our code will be released upon publication.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow [5.476991379461233]
We propose global attentive flow embedding to match all-to-all point pairs in both Euclidean space.
We leverage novel domain adaptive losses to bridge the gap of motion inference from synthetic to real-world.
Our approach achieves state-of-the-art performance across various datasets, with particularly outstanding results on real-world LiDAR-scanned datasets.
arXiv Detail & Related papers (2024-03-11T04:56:10Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency [3.124750429062221]
We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects.
The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models.
We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets.
arXiv Detail & Related papers (2023-12-12T11:00:39Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding
Alignment [58.8330387551499]
We formulate the problem as estimation of point-wise trajectories (i.e., smooth curves)
We propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.
We demonstrate the effectiveness of our method on various point cloud sequences and observe large improvement over state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-03-22T10:14:08Z) - Residual 3D Scene Flow Learning with Context-Aware Feature Extraction [11.394559627312743]
We propose a novel context-aware set conv layer to exploit contextual structure information of Euclidean space.
We also propose an explicit residual flow learning structure in the residual flow refinement layer to cope with long-distance movement.
Our method achieves state-of-the-art performance, surpassing all other previous works to the best of our knowledge by at least 25%.
arXiv Detail & Related papers (2021-09-10T06:15:18Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Learning to Estimate Hidden Motions with Global Motion Aggregation [71.12650817490318]
Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences.
We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image.
We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
arXiv Detail & Related papers (2021-04-06T10:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.