Sequential Amodal Segmentation via Cumulative Occlusion Learning
- URL: http://arxiv.org/abs/2405.05791v1
- Date: Thu, 9 May 2024 14:17:26 GMT
- Title: Sequential Amodal Segmentation via Cumulative Occlusion Learning
- Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger,
- Abstract summary: A visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order.
We introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories.
This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions.
It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes.
- Score: 15.729212571002906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines.
Related papers
- Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - pix2gestalt: Amodal Segmentation by Synthesizing Wholes [34.45464291259217]
pix2gestalt is a framework for zero-shot amodal segmentation.
We learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases.
arXiv Detail & Related papers (2024-01-25T18:57:36Z) - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem.
We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene.
By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z) - Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion
Modeling [0.0]
Instance-aware segmentation of unseen objects is essential for a robotic system in an unstructured environment.
This paper addresses Unseen Object Amodal Instances (UOAIS) to detect 1) visible masks, 2) amodal masks, and 3) occlusions on unseen object instances.
We evaluate our method on three benchmarks (tabletop, indoors, and bin environments) and achieved state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2021-09-23T01:55:42Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z) - CellSegmenter: unsupervised representation learning and instance
segmentation of modular images [0.0]
We introduce a structured deep generative model and an amortized inference framework for unsupervised representation learning and instance segmentation tasks.
The proposed inference algorithm is convolutional and parallelized, without any recurrent mechanisms.
We show segmentation results obtained for a cell nuclei imaging dataset, demonstrating the ability of our method to provide high-quality segmentations.
arXiv Detail & Related papers (2020-11-25T02:10:58Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.