MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
- URL: http://arxiv.org/abs/2407.19323v5
- Date: Sun, 16 Mar 2025 16:02:02 GMT
- Title: MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
- Authors: Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Jinguo Luo, Tianlu Mao, Zhaoqi Wang,
- Abstract summary: MSP-MVS is a method introducing multi-granularity segmentation prior to edge-confined patch deformation.<n>We implement equidistribution and disassemble-clustering of correlative reliable pixels.<n>We also introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs.
- Score: 8.303396507129266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.
Related papers
- DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo [7.544716770845737]
We propose DVP-MVS++, an innovative approach that synergizes both depth-normal-edge aligned and harmonized cross-view priors for robust and visibility-aware patch deformation.<n> Evaluation results on ETH3D, Tanks & Temples and Strecha datasets exhibit the state-of-the-art performance and robust generalization capability of our proposed method.
arXiv Detail & Related papers (2025-06-16T08:15:22Z) - PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z) - PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening [20.43260906326048]
We propose PAN-Crafter, a modality-consistent alignment framework.<n>At its core, Modality-Adaptive Reconstruction (MARs) enables a single network to jointly reconstruct HRMS and PAN images.<n> experiments on multiple benchmark datasets demonstrate that our PAN-Crafter outperforms the most recent state-of-the-art method in all metrics.
arXiv Detail & Related papers (2025-05-29T11:46:21Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint [11.165686149180054]
We propose SED-MVS, which adopts panoptic segmentation and multi-trajectory diffusion strategy for segmentation-driven and edge-aligned patch deformation.
Specifically, to prevent unanticipated edge-skipping, we first employ SAM2 for panoptic segmentation as depth-edge guidance to guide patch deformation, followed by multi-trajectory diffusion strategy to ensure patches are comprehensively aligned with depth edges.
arXiv Detail & Related papers (2025-03-17T21:07:44Z) - Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations [56.84921040837699]
We propose a novel semi-supervised method named Semi360, which incorporates the priors of the panoramic layout and distortion through collaborative perturbations.
Our experimental results on three mainstream benchmarks demonstrate that the proposed method offers significant advantages over existing state-of-the-art (SoTA) solutions.
arXiv Detail & Related papers (2025-03-03T02:49:20Z) - Dual-Level Precision Edges Guided Multi-View Stereo with Accurate Planarization [3.597821311597427]
Multi-view stereo (MVS) reconstruction of low-textured areas is a prominent research focus.
Traditional MVS methods often encounter issues such as crossing object boundaries and limited perception ranges.
We introduce dual-level precision edge information, including fine and coarse edges, to enhance the robustness of plane model construction.
Our method achieves state-of-the-art performance on the ETH3D and Tanks & Temples benchmarks.
arXiv Detail & Related papers (2024-12-29T02:54:01Z) - DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo [8.303396507129266]
We propose DVP-MVS, which synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation.
Our method can achieve state-of-the-art performance with excellent robustness and generalization.
arXiv Detail & Related papers (2024-12-16T09:09:10Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation [33.67313662538398]
We propose a multi-resolution training framework for open-vocabulary semantic segmentation with a single pretrained CLIP backbone.
MROVSeg uses sliding windows to slice the high-resolution input into uniform patches, each matching the input size of the well-trained image encoder.
We demonstrate the superiority of MROVSeg on well-established open-vocabulary semantic segmentation benchmarks.
arXiv Detail & Related papers (2024-08-27T04:45:53Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View
Stereo [7.130834755320434]
We propose a resilient and effective multi-view stereo approach (MP-MVS)
We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas.
In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches.
arXiv Detail & Related papers (2023-09-23T07:30:42Z) - Improving Misaligned Multi-modality Image Fusion with One-stage
Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion.
We propose a Cross-modality Multi-scale Progressive Dense Registration scheme.
This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z) - TSAR-MVS: Textureless-aware Segmentation and Correlative Refinement Guided Multi-View Stereo [3.6728185343140685]
We propose a Textureless-aware And Correlative Refinement guided Multi-View Stereo (TSAR-MVS) method.
It effectively tackles challenges posed by textureless areas in 3D reconstruction through filtering, refinement and segmentation.
Experiments on ETH3D, Tanks & Temples and Strecha datasets demonstrate the superior performance and strong capability of our proposed method.
arXiv Detail & Related papers (2023-08-19T11:40:57Z) - Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - DeViT: Deformed Vision Transformers in Video Inpainting [59.73019717323264]
We extend previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH)
Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching.
Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens.
arXiv Detail & Related papers (2022-09-28T08:57:14Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z) - Attention Toward Neighbors: A Context Aware Framework for High
Resolution Image Segmentation [2.9210447295585724]
We propose a novel framework to segment a particular patch by incorporating contextual information from its neighboring patches.
This allows the segmentation network to see the target patch with a wider field of view without the need of larger feature maps.
arXiv Detail & Related papers (2021-06-24T10:58:09Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Look Closer to Segment Better: Boundary Patch Refinement for Instance
Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality.
The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark.
By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.