Amodal Segmentation through Out-of-Task and Out-of-Distribution
Generalization with a Bayesian Model
- URL: http://arxiv.org/abs/2010.13175v4
- Date: Sat, 9 Jul 2022 04:42:39 GMT
- Title: Amodal Segmentation through Out-of-Task and Out-of-Distribution
Generalization with a Bayesian Model
- Authors: Yihong Sun, Adam Kortylewski, Alan Yuille
- Abstract summary: Amodal completion is a visual task that humans perform easily but which is difficult for computer vision algorithms.
We formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem.
Our algorithm outperforms alternative methods that use the same supervision by a large margin.
- Score: 19.235173141731885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal completion is a visual task that humans perform easily but which is
difficult for computer vision algorithms. The aim is to segment those object
boundaries which are occluded and hence invisible. This task is particularly
challenging for deep neural networks because data is difficult to obtain and
annotate. Therefore, we formulate amodal segmentation as an out-of-task and
out-of-distribution generalization problem. Specifically, we replace the fully
connected classifier in neural networks with a Bayesian generative model of the
neural network features. The model is trained from non-occluded images using
bounding box annotations and class labels only, but is applied to generalize
out-of-task to object segmentation and to generalize out-of-distribution to
segment occluded objects. We demonstrate how such Bayesian models can naturally
generalize beyond the training task labels when they learn a prior that models
the object's background context and shape. Moreover, by leveraging an outlier
process, Bayesian models can further generalize out-of-distribution to segment
partially occluded objects and to predict their amodal object boundaries. Our
algorithm outperforms alternative methods that use the same supervision by a
large margin, and even outperforms methods where annotated amodal segmentations
are used during training, when the amount of occlusion is large. Code is
publicly available at https://github.com/YihongSun/Bayesian-Amodal.
Related papers
- Using Diffusion Priors for Video Amodal Segmentation [44.36499624938911]
We propose to tackle video amodal segmentation by formulating it as a conditional generation task, capitalizing on the foundational knowledge in video generative models.
Our method is simple; we repurpose these models to condition on a sequence of modal mask frames of an object along with contextual pseudo-depth maps.
This is followed by a content completion stage which is able to inpaint the occluded regions of an object.
arXiv Detail & Related papers (2024-12-05T21:30:40Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Sequential Amodal Segmentation via Cumulative Occlusion Learning [15.729212571002906]
A visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order.
We introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories.
This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions.
It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes.
arXiv Detail & Related papers (2024-05-09T14:17:26Z) - BLADE: Box-Level Supervised Amodal Segmentation through Directed
Expansion [10.57956193654977]
Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision.
We present a novel solution by introducing a directed expansion approach from visible masks to corresponding amodal masks.
Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect.
arXiv Detail & Related papers (2024-01-03T09:37:03Z) - Amodal Ground Truth and Completion in the Wild [84.54972153436466]
We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
arXiv Detail & Related papers (2023-12-28T18:59:41Z) - Coarse-to-Fine Amodal Segmentation with Shape Prior [52.38348188589834]
Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object.
We propose a novel approach called Coarse-to-Fine: C2F-Seg, that addresses this problem by progressively modeling the amodal segmentation.
arXiv Detail & Related papers (2023-08-31T15:56:29Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Self-supervised Amodal Video Object Segmentation [57.929357732733926]
Amodal perception requires inferring the full shape of an object that is partially occluded.
This paper develops a new framework of amodal Video object segmentation (SaVos)
arXiv Detail & Related papers (2022-10-23T14:09:35Z) - A Weakly Supervised Amodal Segmenter with Boundary Uncertainty
Estimation [35.103437828235826]
This paper addresses weakly supervised amodal instance segmentation.
The goal is to segment both visible and occluded (amodal) object parts, while training provides only ground-truth visible (modal) segmentations.
arXiv Detail & Related papers (2021-08-23T02:27:29Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.