MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo
- URL: http://arxiv.org/abs/2407.19323v3
- Date: Sat, 14 Sep 2024 07:15:34 GMT
- Title: MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo
- Authors: Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang,
- Abstract summary: Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch deformation.
We introduce Multi-granularity equidistribution segmentation prior by integrating multi-granularity depth edges to restrict patch deformation within homogeneous areas.
We also present anchor that bring deformed patches with more uniformly distributed anchors to ensure an adequate coverage of their own homogeneous areas.
- Score: 8.71977630878479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing textureless areas in MVS poses challenges due to the absence of reliable pixel correspondences within fixed patch. Although certain methods employ patch deformation to expand the receptive field, their patches mistakenly skip depth edges to calculate areas with depth discontinuity, thereby causing ambiguity. Consequently, we introduce Multi-granularity Segmentation Prior Multi-View Stereo (MSP-MVS). Specifically, we first propose multi-granularity segmentation prior by integrating multi-granularity depth edges to restrict patch deformation within homogeneous areas. Moreover, we present anchor equidistribution that bring deformed patches with more uniformly distributed anchors to ensure an adequate coverage of their own homogeneous areas. Furthermore, we introduce iterative local search optimization to represent larger patch with sparse representative candidates, significantly boosting the expressive capacity for each patch. The state-of-the-art results on ETH3D and Tanks & Temples benchmarks demonstrate the effectiveness and robust generalization ability of our proposed method.
Related papers
- SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint [11.165686149180054]
We propose SED-MVS, which adopts panoptic segmentation and multi-trajectory diffusion strategy for segmentation-driven and edge-aligned patch deformation.
Specifically, to prevent unanticipated edge-skipping, we first employ SAM2 for panoptic segmentation as depth-edge guidance to guide patch deformation, followed by multi-trajectory diffusion strategy to ensure patches are comprehensively aligned with depth edges.
arXiv Detail & Related papers (2025-03-17T21:07:44Z) - Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations [56.84921040837699]
We propose a novel semi-supervised method named Semi360, which incorporates the priors of the panoramic layout and distortion through collaborative perturbations.
Our experimental results on three mainstream benchmarks demonstrate that the proposed method offers significant advantages over existing state-of-the-art (SoTA) solutions.
arXiv Detail & Related papers (2025-03-03T02:49:20Z) - Dual-Level Precision Edges Guided Multi-View Stereo with Accurate Planarization [3.597821311597427]
Multi-view stereo (MVS) reconstruction of low-textured areas is a prominent research focus.
Traditional MVS methods often encounter issues such as crossing object boundaries and limited perception ranges.
We introduce dual-level precision edge information, including fine and coarse edges, to enhance the robustness of plane model construction.
Our method achieves state-of-the-art performance on the ETH3D and Tanks & Temples benchmarks.
arXiv Detail & Related papers (2024-12-29T02:54:01Z) - DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo [8.303396507129266]
We propose DVP-MVS, which synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation.
Our method can achieve state-of-the-art performance with excellent robustness and generalization.
arXiv Detail & Related papers (2024-12-16T09:09:10Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation [33.67313662538398]
We propose a multi-resolution training framework for open-vocabulary semantic segmentation with a single pretrained CLIP backbone.
MROVSeg uses sliding windows to slice the high-resolution input into uniform patches, each matching the input size of the well-trained image encoder.
We demonstrate the superiority of MROVSeg on well-established open-vocabulary semantic segmentation benchmarks.
arXiv Detail & Related papers (2024-08-27T04:45:53Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - Improving Misaligned Multi-modality Image Fusion with One-stage
Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion.
We propose a Cross-modality Multi-scale Progressive Dense Registration scheme.
This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z) - TSAR-MVS: Textureless-aware Segmentation and Correlative Refinement Guided Multi-View Stereo [3.6728185343140685]
We propose a Textureless-aware And Correlative Refinement guided Multi-View Stereo (TSAR-MVS) method.
It effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation.
Experiments on ETH3D, Tanks & Temples and Strecha datasets demonstrate the superior performance and strong capability of our proposed method.
arXiv Detail & Related papers (2023-08-19T11:40:57Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - DeViT: Deformed Vision Transformers in Video Inpainting [59.73019717323264]
We extend previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH)
Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching.
Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens.
arXiv Detail & Related papers (2022-09-28T08:57:14Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z) - Attention Toward Neighbors: A Context Aware Framework for High
Resolution Image Segmentation [2.9210447295585724]
We propose a novel framework to segment a particular patch by incorporating contextual information from its neighboring patches.
This allows the segmentation network to see the target patch with a wider field of view without the need of larger feature maps.
arXiv Detail & Related papers (2021-06-24T10:58:09Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Look Closer to Segment Better: Boundary Patch Refinement for Instance
Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality.
The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark.
By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.