Amodal Instance Segmentation with Diffusion Shape Prior Estimation
- URL: http://arxiv.org/abs/2409.18256v2
- Date: Fri, 4 Oct 2024 22:00:32 GMT
- Title: Amodal Instance Segmentation with Diffusion Shape Prior Estimation
- Authors: Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le,
- Abstract summary: Amodal Instance (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images.
Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation.
Recent advancements highlight the potential of conditioned diffusion models, pretrained on extensive datasets, to generate images from latent space.
- Score: 10.064183379778388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal Instance Segmentation (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images. Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation. However, these approaches are susceptible to overfitting and disregard object category details. Recent advancements highlight the potential of conditioned diffusion models, pretrained on extensive datasets, to generate images from latent space. Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module. AISDiff begins with the prediction of the visible segmentation mask and object category, alongside occlusion-aware processing through the prediction of occluding masks. Subsequently, these elements are inputted into our DiffSP module to infer the shape prior of the object. DiffSP utilizes conditioned diffusion models pretrained on extensive datasets to extract rich visual features for shape prior estimation. Additionally, we introduce the Shape Prior Amodal Predictor, which utilizes attention-based feature maps from the shape prior to refine amodal segmentation. Experiments across various AIS benchmarks demonstrate the effectiveness of our AISDiff.
Related papers
- High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - Sequential Amodal Segmentation via Cumulative Occlusion Learning [15.729212571002906]
A visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order.
We introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories.
This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions.
It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes.
arXiv Detail & Related papers (2024-05-09T14:17:26Z) - pix2gestalt: Amodal Segmentation by Synthesizing Wholes [34.45464291259217]
pix2gestalt is a framework for zero-shot amodal segmentation.
We learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases.
arXiv Detail & Related papers (2024-01-25T18:57:36Z) - Diffusion Model is Secretly a Training-free Open Vocabulary Semantic
Segmenter [47.29967666846132]
generative text-to-image diffusion models are highly efficient open-vocabulary semantic segmenters.
We introduce a novel training-free approach named DiffSegmenter to generate realistic objects that are semantically faithful to the input text.
Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2023-09-06T06:31:08Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - Learning with Explicit Shape Priors for Medical Image Segmentation [17.110893665132423]
We propose a novel shape prior module (SPM) to promote the segmentation performance of UNet-based models.
Explicit shape priors consist of global and local shape priors.
Our proposed model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-31T11:12:35Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Learning Vector Quantized Shape Code for Amodal Blastomere Instance
Segmentation [33.558545104711186]
Amodal instance segmentation aims to recover the complete silhouette of an object even when the object is not fully visible.
We propose to classify input features into intermediate shape codes and recover complete object shapes from them.
Our method would enable accurate measurement of blastomeres in in vitro fertilization (IVF) clinics.
arXiv Detail & Related papers (2020-12-02T06:17:28Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.