Unifying Flow, Stereo and Depth Estimation
- URL: http://arxiv.org/abs/2211.05783v3
- Date: Wed, 26 Jul 2023 15:42:58 GMT
- Title: Unifying Flow, Stereo and Depth Estimation
- Authors: Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu,
Dacheng Tao, Andreas Geiger
- Abstract summary: We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
- Score: 121.54066319299261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a unified formulation and model for three motion and 3D perception
tasks: optical flow, rectified stereo matching and unrectified stereo depth
estimation from posed images. Unlike previous specialized architectures for
each specific task, we formulate all three tasks as a unified dense
correspondence matching problem, which can be solved with a single model by
directly comparing feature similarities. Such a formulation calls for
discriminative feature representations, which we achieve using a Transformer,
in particular the cross-attention mechanism. We demonstrate that
cross-attention enables integration of knowledge from another image via
cross-view interactions, which greatly improves the quality of the extracted
features. Our unified model naturally enables cross-task transfer since the
model architecture and parameters are shared across tasks. We outperform RAFT
with our unified model on the challenging Sintel dataset, and our final model
that uses a few additional task-specific refinement steps outperforms or
compares favorably to recent state-of-the-art methods on 10 popular flow,
stereo and depth datasets, while being simpler and more efficient in terms of
model design and inference speed.
Related papers
- Unifying Event-based Flow, Stereo and Depth Estimation via Feature Similarity Matching [21.71115793248267]
Event camera has gained popularity in various vision tasks such as optical flow estimation, stereo matching, and depth estimation.
We propose a unified framework, EventMatch, that reformulates these tasks as an event-based dense correspondence matching problem.
Our model exhibits superior performance in both optical flow and disparity estimation tasks, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-31T16:43:20Z) - A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - Multi-task Learning with 3D-Aware Regularization [55.97507478913053]
We propose a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space.
We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance.
arXiv Detail & Related papers (2023-10-02T08:49:56Z) - CroCo v2: Improved Cross-view Completion Pre-training for Stereo
Matching and Optical Flow [22.161967080759993]
Self-supervised pre-training methods have not yet delivered on dense geometric vision tasks such as stereo matching or optical flow.
We build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene.
We show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques.
arXiv Detail & Related papers (2022-11-18T18:18:53Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.