Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior
- URL: http://arxiv.org/abs/2503.06100v3
- Date: Fri, 28 Mar 2025 14:47:24 GMT
- Title: Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior
- Authors: Xianjie Liu, Keren Fu, Qijun Zhao,
- Abstract summary: Dichotomous Image (DIS) is a high-precision object segmentation task for high-resolution natural images.<n>We have designed a novel Patch-Depth Fusion Network (PDFNet) for high-precision dichotomous image segmentation.<n>PDFNet significantly outperforms state-of-the-art non-diffusion methods.
- Score: 12.03947802006261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dichotomous Image Segmentation (DIS) is a high-precision object segmentation task for high-resolution natural images. The current mainstream methods focus on the optimization of local details but overlook the fundamental challenge of modeling the integrity of objects. We have found that the depth integrity-prior implicit in the the pseudo-depth maps generated by Depth Anything Model v2 and the local detail features of image patches can jointly address the above dilemmas. Based on the above findings, we have designed a novel Patch-Depth Fusion Network (PDFNet) for high-precision dichotomous image segmentation. The core of PDFNet consists of three aspects. Firstly, the object perception is enhanced through multi-modal input fusion. By utilizing the patch fine-grained strategy, coupled with patch selection and enhancement, the sensitivity to details is improved. Secondly, by leveraging the depth integrity-prior distributed in the depth maps, we propose an integrity-prior loss to enhance the uniformity of the segmentation results in the depth maps. Finally, we utilize the features of the shared encoder and, through a simple depth refinement decoder, improve the ability of the shared encoder to capture subtle depth-related information in the images. Experiments on the DIS-5K dataset show that PDFNet significantly outperforms state-of-the-art non-diffusion methods. Due to the incorporation of the depth integrity-prior, PDFNet achieves or even surpassing the performance of the latest diffusion-based methods while using less than 11% of the parameters of diffusion-based methods. The source code at https://github.com/Tennine2077/PDFNet
Related papers
- Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.
Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.
We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications.<n>This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors.<n>Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z) - Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
Bit-depth compression produces a uniform depth representation in regions with subtle variations, hindering the recovery of detailed information.
densely distributed random noise reduces the accuracy of estimating the global geometric structure of the scene.
We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.
arXiv Detail & Related papers (2024-11-05T16:37:30Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion [30.03608191629917]
This paper presents a novel monocular depth estimation method, named ECFNet, for estimating high-quality monocular depth with clear edges and valid overall structure from a single RGB image.
We make a thorough inquiry about the key factor that affects the edge depth estimation of the MDE networks, and come to a ratiocination that the edge information itself plays a critical role in predicting depth details.
arXiv Detail & Related papers (2024-03-30T13:58:19Z) - Scene Prior Filtering for Depth Super-Resolution [97.30137398361823]
We introduce a Scene Prior Filtering network, SPFNet, to mitigate texture interference and edge inaccuracy.
Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-02-21T15:35:59Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation [14.81943833870932]
We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
arXiv Detail & Related papers (2020-12-14T09:15:15Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Defocus Blur Detection via Depth Distillation [64.78779830554731]
We introduce depth information into DBD for the first time.
In detail, we learn the defocus blur from ground truth and the depth distilled from a well-trained depth estimation network.
Our approach outperforms 11 other state-of-the-art methods on two popular datasets.
arXiv Detail & Related papers (2020-07-16T04:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.