Completing Visual Objects via Bridging Generation and Segmentation
- URL: http://arxiv.org/abs/2310.00808v2
- Date: Fri, 2 Feb 2024 07:14:19 GMT
- Title: Completing Visual Objects via Bridging Generation and Segmentation
- Authors: Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh,
Bhiksha Raj, Lijuan Wang, Zicheng Liu
- Abstract summary: MaskComp delineates the completion process through iterative stages of generation and segmentation.
In each iteration, the object mask is provided as an additional condition to boost image generation.
We demonstrate that the combination of one generation and one segmentation stage effectively functions as a mask denoiser.
- Score: 84.4552458720467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach to object completion, with the primary
goal of reconstructing a complete object from its partially visible components.
Our method, named MaskComp, delineates the completion process through iterative
stages of generation and segmentation. In each iteration, the object mask is
provided as an additional condition to boost image generation, and, in return,
the generated images can lead to a more accurate mask by fusing the
segmentation of images. We demonstrate that the combination of one generation
and one segmentation stage effectively functions as a mask denoiser. Through
alternation between the generation and segmentation stages, the partial object
mask is progressively refined, providing precise shape guidance and yielding
superior object completion results. Our experiments demonstrate the superiority
of MaskComp over existing approaches, e.g., ControlNet and Stable Diffusion,
establishing it as an effective solution for object completion.
Related papers
- Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - PaintSeg: Training-free Segmentation via Painting [50.17936803209125]
PaintSeg is a new unsupervised method for segmenting objects without any training.
Inpainting and outpainting are alternated, with the former masking the foreground and filling in the background, and the latter masking the background while recovering the missing part of the foreground object.
Our experimental results demonstrate that PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and point-prompt segmentation tasks.
arXiv Detail & Related papers (2023-05-30T20:43:42Z) - MMNet: Multi-Mask Network for Referring Image Segmentation [6.462622145673872]
We propose an end-to-end Multi-Mask Network for referring image segmentation(MMNet)
We first combine picture and language then employ an attention mechanism to generate multiple queries that represent different aspects of the language expression.
The final result is obtained through the weighted sum of all masks, which greatly reduces the randomness of the language expression.
arXiv Detail & Related papers (2023-05-24T10:02:27Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - GANSeg: Learning to Segment by Unsupervised Hierarchical Image
Generation [16.900404701997502]
We propose a GAN-based approach that generates images conditioned on latent masks.
We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner.
It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
arXiv Detail & Related papers (2021-12-02T07:57:56Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Proposal-Free Volumetric Instance Segmentation from Latent
Single-Instance Masks [16.217524435617744]
This work introduces a new proposal-free instance segmentation method.
It builds on single-instance segmentation masks predicted across the entire image in a sliding window style.
In contrast to related approaches, our method concurrently predicts all masks, one for each pixel, and thus resolves any conflict jointly across the entire image.
arXiv Detail & Related papers (2020-09-10T17:09:23Z) - Revisiting Sequence-to-Sequence Video Object Segmentation with
Multi-Task Loss and Skip-Memory [4.343892430915579]
Video Object (VOS) is an active research area of the visual domain.
Current approaches lose objects in longer sequences, especially when the object is small or briefly occluded.
We build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data.
arXiv Detail & Related papers (2020-04-25T15:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.