Related papers: Sequential Amodal Segmentation via Cumulative Occlusion Learning

Sequential Amodal Segmentation via Cumulative Occlusion Learning

URL: http://arxiv.org/abs/2405.05791v1
Date: Thu, 9 May 2024 14:17:26 GMT
Title: Sequential Amodal Segmentation via Cumulative Occlusion Learning
Authors: Jiayang Ao, Qiuhong Ke, Krista A. Ehinger,
Abstract summary: A visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. We introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes.
Score: 15.729212571002906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines.

Related papers

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning [64.32618490065117]
A core problem of Embodied AI is to learn object manipulation from observation, as humans do.<n>We propose a novel approach that learns an affordance-aware 3D representation and employs a stage-wise inference strategy.<n> Experiments demonstrate the effectiveness of our method, showing improved performance in both affordance grounding and classification.
arXiv Detail & Related papers (2025-08-02T04:14:18Z)
QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction [49.75084732129701]
3D occupancy prediction is crucial for robust autonomous driving systems.<n>Most existing methods employ dense voxel-based scene representations.<n>We present QuadricFormer, a superquadric-based model for efficient 3D occupancy prediction.
arXiv Detail & Related papers (2025-06-12T17:59:45Z)
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA [49.10341970643037]
Amodal segmentation aims to infer the complete shape of occluded objects, even when the occluded region's appearance is unavailable. Current amodal segmentation methods lack the capability to interact with users through text input. We propose a novel task named amodal reasoning segmentation, aiming to predict the complete amodal shape of occluded objects.
arXiv Detail & Related papers (2025-03-13T10:08:18Z)
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving [12.406655155106424]
We propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level.
arXiv Detail & Related papers (2024-10-30T15:00:06Z)
LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes. We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net) The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z)
Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning. Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
pix2gestalt: Amodal Segmentation by Synthesizing Wholes [34.45464291259217]
pix2gestalt is a framework for zero-shot amodal segmentation. We learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases.
arXiv Detail & Related papers (2024-01-25T18:57:36Z)
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z)
Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image. We first infer a volumetric representation in a canonical frame, along with the camera pose. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z)
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion. Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders. In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z)
CellSegmenter: unsupervised representation learning and instance segmentation of modular images [0.0]
We introduce a structured deep generative model and an amortized inference framework for unsupervised representation learning and instance segmentation tasks. The proposed inference algorithm is convolutional and parallelized, without any recurrent mechanisms. We show segmentation results obtained for a cell nuclei imaging dataset, demonstrating the ability of our method to provide high-quality segmentations.
arXiv Detail & Related papers (2020-11-25T02:10:58Z)
Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.