Amodal Ground Truth and Completion in the Wild
- URL: http://arxiv.org/abs/2312.17247v2
- Date: Mon, 29 Apr 2024 17:35:27 GMT
- Title: Amodal Ground Truth and Completion in the Wild
- Authors: Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman,
- Abstract summary: We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
- Score: 84.54972153436466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at https://www.robots.ox.ac.uk/~vgg/research/amodal/.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Hyper-Transformer for Amodal Completion [82.4118011026855]
Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information.
We introduce a novel framework named the Hyper-Transformer Amodal Network (H-TAN)
This framework utilizes a hyper transformer equipped with a dynamic convolution head to directly learn shape priors and accurately predict amodal masks.
arXiv Detail & Related papers (2024-05-30T11:11:54Z) - TAO-Amodal: A Benchmark for Tracking Any Object Amodally [41.5396827282691]
We introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences.
Our dataset includes textitamodal and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame.
arXiv Detail & Related papers (2023-12-19T18:58:40Z) - Coarse-to-Fine Amodal Segmentation with Shape Prior [52.38348188589834]
Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object.
We propose a novel approach called Coarse-to-Fine: C2F-Seg, that addresses this problem by progressively modeling the amodal segmentation.
arXiv Detail & Related papers (2023-08-31T15:56:29Z) - Multimodal Diffusion Segmentation Model for Object Segmentation from
Manipulation Instructions [0.0]
We develop a model that comprehends a natural language instruction and generates a segmentation mask for the target everyday object.
We build a new dataset based on the well-known Matterport3D and REVERIE datasets.
The performance of MDSM surpassed that of the baseline method by a large margin of +10.13 mean IoU.
arXiv Detail & Related papers (2023-07-17T16:07:07Z) - Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal
Rearrangement [49.888011242939385]
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship.
The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects.
arXiv Detail & Related papers (2023-07-10T17:56:06Z) - Amodal Intra-class Instance Segmentation: Synthetic Datasets and
Benchmark [17.6780586288079]
This paper introduces two new amodal datasets for image amodal completion tasks.
We also present a point-supervised scheme with layer priors for amodal instance segmentation.
Experiments show that our weakly supervised approach outperforms the SOTA fully supervised methods.
arXiv Detail & Related papers (2023-03-12T07:28:36Z) - Self-supervised Amodal Video Object Segmentation [57.929357732733926]
Amodal perception requires inferring the full shape of an object that is partially occluded.
This paper develops a new framework of amodal Video object segmentation (SaVos)
arXiv Detail & Related papers (2022-10-23T14:09:35Z) - AISFormer: Amodal Instance Segmentation with Transformer [9.042737643989561]
Amodal Instance (AIS) aims to segment the region of both visible and possible occluded parts of an object instance.
We present AISFormer, an AIS framework, with a Transformer-based mask head.
arXiv Detail & Related papers (2022-10-12T15:42:40Z) - Amodal Cityscapes: A New Dataset, its Generation, and an Amodal Semantic
Segmentation Challenge Baseline [38.8592627329447]
We consider the task of amodal semantic segmentation and propose a generic way to generate datasets to train amodal semantic segmentation methods.
We use this approach to generate an amodal Cityscapes dataset, showing its applicability for amodal semantic segmentation in automotive environment perception.
arXiv Detail & Related papers (2022-06-01T14:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.