DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
- URL: http://arxiv.org/abs/2412.11578v2
- Date: Sun, 16 Mar 2025 16:07:48 GMT
- Title: DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
- Authors: Zhenlong Yuan, Jinguo Luo, Fei Shen, Zhaoxin Li, Cong Liu, Tianlu Mao, Zhaoqi Wang,
- Abstract summary: We propose DVP-MVS, which synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation.<n>Our method can achieve state-of-the-art performance with excellent robustness and generalization.
- Score: 8.303396507129266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Patch deformation-based methods have recently exhibited substantial effectiveness in multi-view stereo, due to the incorporation of deformable and expandable perception to reconstruct textureless areas. However, such approaches typically focus on exploring correlative reliable pixels to alleviate match ambiguity during patch deformation, but ignore the deformation instability caused by mistaken edge-skipping and visibility occlusion, leading to potential estimation deviation. To remedy the above issues, we propose DVP-MVS, which innovatively synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation. Specifically, to avoid unexpected edge-skipping, we first utilize Depth Anything V2 followed by the Roberts operator to initialize coarse depth and edge maps respectively, both of which are further aligned through an erosion-dilation strategy to generate fine-grained homogeneous boundaries for guiding patch deformation. In addition, we reform view selection weights as visibility maps and restore visible areas by cross-view depth reprojection, then regard them as cross-view prior to facilitate visibility-aware patch deformation. Finally, we improve propagation and refinement with multi-view geometry consistency by introducing aggregated visible hemispherical normals based on view selection and local projection depth differences based on epipolar lines, respectively. Extensive evaluations on ETH3D and Tanks & Temples benchmarks demonstrate that our method can achieve state-of-the-art performance with excellent robustness and generalization.
Related papers
- Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.
Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.
We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint [11.165686149180054]
We propose SED-MVS, which adopts panoptic segmentation and multi-trajectory diffusion strategy for segmentation-driven and edge-aligned patch deformation.
Specifically, to prevent unanticipated edge-skipping, we first employ SAM2 for panoptic segmentation as depth-edge guidance to guide patch deformation, followed by multi-trajectory diffusion strategy to ensure patches are comprehensively aligned with depth edges.
arXiv Detail & Related papers (2025-03-17T21:07:44Z) - MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo [8.71977630878479]
Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch deformation.
We introduce Multi-granularity equidistribution segmentation prior by integrating multi-granularity depth edges to restrict patch deformation within homogeneous areas.
We also present anchor that bring deformed patches with more uniformly distributed anchors to ensure an adequate coverage of their own homogeneous areas.
arXiv Detail & Related papers (2024-07-27T19:00:44Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - PatchMVSNet: Patch-wise Unsupervised Multi-View Stereo for
Weakly-Textured Surface Reconstruction [2.9896482273918434]
This paper proposes robust loss functions leveraging constraints beneath multi-view images to alleviate matching ambiguity.
Our strategy can be implemented with arbitrary depth estimation frameworks and can be trained with arbitrary large-scale MVS datasets.
Our method reaches the performance of the state-of-the-art methods on popular benchmarks, like DTU, Tanks and Temples and ETH3D.
arXiv Detail & Related papers (2022-03-04T07:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.