High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
- URL: http://arxiv.org/abs/2503.06100v4
- Date: Sun, 28 Sep 2025 08:14:20 GMT
- Title: High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy
- Authors: Xianjie Liu, Keren Fu, Qijun Zhao,
- Abstract summary: High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images.<n>Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics.<n>We find pseudo depth information from monocular depth estimation models can provide essential semantic understanding.
- Score: 23.431898388115044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens. As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement. Notably, PDFNet achieves state-of-the-art performance with only 94M parameters (<11% of those diffusion-based models), outperforming all non-diffusion methods and surpassing some diffusion methods. Code is provided in the supplementary materials.
Related papers
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing [77.10640210751981]
UDPNet is a general framework that leverages depth-based priors from a large-scale pretrained depth estimation model DepthAnything V2.<n>Our proposed solution establishes a new benchmark for depth-aware dehazing across various scenarios.
arXiv Detail & Related papers (2026-01-11T13:29:02Z) - StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion [56.28564075246147]
StarryGazer is a framework that predicts dense depth images from a single sparse depth image and an RGB image.<n>We employ a pre-trained MDE model to produce relative depth images.<n>A refinement network is trained with the synthetic pairs, incorporating the relative depth maps and RGB images to improve the model's accuracy and robustness.
arXiv Detail & Related papers (2025-12-15T09:56:09Z) - DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation [33.87636820220007]
Commercial RGB-D cameras often produce noisy, incomplete depth maps for non-Lambertian objects.<n>We propose textbfDidSee, a diffusion-based framework for depth completion on non-Lambertian objects.<n>DidSee achieves state-of-the-art performance on multiple benchmarks, demonstrates robust real-world generalization, and effectively improves downstream tasks.
arXiv Detail & Related papers (2025-06-26T06:18:42Z) - DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model [18.694510415777632]
We propose DenseFormer, a novel method that integrates the diffusion model into the depth completion task.<n>DenseFormer generates the dense depth map by progressively refining an initial random depth distribution through multiple iterations.<n>This paper presents a depth refinement module that applies multi-step iterative refinement across various ranges to the dense depth results generated by the diffusion process.
arXiv Detail & Related papers (2025-03-31T12:11:01Z) - Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.
Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.
We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications.<n>This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors.<n>Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z) - Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [57.08169927189237]
Existing methods for depth completion operate in tightly constrained settings.<n>Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation.<n>Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z) - Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
Bit-depth compression produces a uniform depth representation in regions with subtle variations, hindering the recovery of detailed information.
densely distributed random noise reduces the accuracy of estimating the global geometric structure of the scene.
We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.
arXiv Detail & Related papers (2024-11-05T16:37:30Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion [30.03608191629917]
This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image.
We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details.
arXiv Detail & Related papers (2024-03-30T13:58:19Z) - Scene Prior Filtering for Depth Super-Resolution [97.30137398361823]
We introduce a Scene Prior Filtering network, SPFNet, to mitigate texture interference and edge inaccuracy.
Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-02-21T15:35:59Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z) - HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation [14.81943833870932]
We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
arXiv Detail & Related papers (2020-12-14T09:15:15Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Defocus Blur Detection via Depth Distillation [64.78779830554731]
We introduce depth information into DBD for the first time.
In detail, we learn the defocus blur from ground truth and the depth distilled from a well-trained depth estimation network.
Our approach outperforms 11 other state-of-the-art methods on two popular datasets.
arXiv Detail & Related papers (2020-07-16T04:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.