Coarse-to-Fine Amodal Segmentation with Shape Prior
- URL: http://arxiv.org/abs/2308.16825v1
- Date: Thu, 31 Aug 2023 15:56:29 GMT
- Title: Coarse-to-Fine Amodal Segmentation with Shape Prior
- Authors: Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng
Zhang and Yanwei Fu
- Abstract summary: Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object.
We propose a novel approach called Coarse-to-Fine: C2F-Seg, that addresses this problem by progressively modeling the amodal segmentation.
- Score: 52.38348188589834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Amodal object segmentation is a challenging task that involves segmenting
both visible and occluded parts of an object. In this paper, we propose a novel
approach, called Coarse-to-Fine Segmentation (C2F-Seg), that addresses this
problem by progressively modeling the amodal segmentation. C2F-Seg initially
reduces the learning space from the pixel-level image space to the
vector-quantized latent space. This enables us to better handle long-range
dependencies and learn a coarse-grained amodal segment from visual features and
visible segments. However, this latent space lacks detailed information about
the object, which makes it difficult to provide a precise segmentation
directly. To address this issue, we propose a convolution refine module to
inject fine-grained information and provide a more precise amodal object
segmentation based on visual features and coarse-predicted segmentation. To
help the studies of amodal object segmentation, we create a synthetic amodal
dataset, named as MOViD-Amodal (MOViD-A), which can be used for both image and
video amodal object segmentation. We extensively evaluate our model on two
benchmark datasets: KINS and COCO-A. Our empirical results demonstrate the
superiority of C2F-Seg. Moreover, we exhibit the potential of our approach for
video amodal object segmentation tasks on FISHBOWL and our proposed MOViD-A.
Project page at: http://jianxgao.github.io/C2F-Seg.
Related papers
- CoReS: Orchestrating the Dance of Reasoning and Segmentation [17.767049542947497]
We believe that the act of reasoning segmentation should mirror the cognitive stages of human visual search.
We introduce the Chains of Reasoning and Segmenting (CoReS) and find this top-down visual hierarchy indeed enhances the visual search process.
Experiments demonstrate the superior performance of our CoReS, which surpasses the state-of-the-art method by 6.5% on the ReasonSeg dataset.
arXiv Detail & Related papers (2024-04-08T16:55:39Z) - Amodal Ground Truth and Completion in the Wild [84.54972153436466]
We use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.
This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels.
arXiv Detail & Related papers (2023-12-28T18:59:41Z) - Multimodal Diffusion Segmentation Model for Object Segmentation from
Manipulation Instructions [0.0]
We develop a model that comprehends a natural language instruction and generates a segmentation mask for the target everyday object.
We build a new dataset based on the well-known Matterport3D and REVERIE datasets.
The performance of MDSM surpassed that of the baseline method by a large margin of +10.13 mean IoU.
arXiv Detail & Related papers (2023-07-17T16:07:07Z) - Self-supervised Amodal Video Object Segmentation [57.929357732733926]
Amodal perception requires inferring the full shape of an object that is partially occluded.
This paper develops a new framework of amodal Video object segmentation (SaVos)
arXiv Detail & Related papers (2022-10-23T14:09:35Z) - Multi-Attention Network for Compressed Video Referring Object
Segmentation [103.18477550023513]
Referring video object segmentation aims to segment the object referred by a given language expression.
Existing works typically require compressed video bitstream to be decoded to RGB frames before being segmented.
This may hamper its application in real-world computing resource limited scenarios, such as autonomous cars and drones.
arXiv Detail & Related papers (2022-07-26T03:00:52Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning.
In particular, we first propose a novel refined differential module for generating finer boundary cues.
An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z) - Evolution of Image Segmentation using Deep Convolutional Neural Network:
A Survey [0.0]
We take a glance at the evolution of both semantic and instance segmentation work based on CNN.
We have given a glimpse of some state-of-the-art panoptic segmentation models.
arXiv Detail & Related papers (2020-01-13T06:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.