Outline-Guided Object Inpainting with Diffusion Models
- URL: http://arxiv.org/abs/2402.16421v1
- Date: Mon, 26 Feb 2024 09:21:17 GMT
- Title: Outline-Guided Object Inpainting with Diffusion Models
- Authors: Markus Pobitzer, Filip Janicki, Mattia Rigotti, Cristiano Malossi
- Abstract summary: Instance segmentation datasets play a crucial role in training accurate and robust computer vision models.
We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset.
We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
- Score: 11.391452115311798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instance segmentation datasets play a crucial role in training accurate and
robust computer vision models. However, obtaining accurate mask annotations to
produce high-quality segmentation datasets is a costly and labor-intensive
process. In this work, we show how this issue can be mitigated by starting with
small annotated instance segmentation datasets and augmenting them to
effectively obtain a sizeable annotated dataset. We achieve that by creating
variations of the available annotated object instances in a way that preserves
the provided mask annotations, thereby resulting in new image-mask pairs to be
added to the set of annotated images. Specifically, we generate new images
using a diffusion-based inpainting model to fill out the masked area with a
desired object class by guiding the diffusion through the object outline. We
show that the object outline provides a simple, but also reliable and
convenient training-free guidance signal for the underlying inpainting model
that is often sufficient to fill out the mask with an object of the correct
class without further text guidance and preserve the correspondence between
generated images and the mask annotations with high precision. Our experimental
results reveal that our method successfully generates realistic variations of
object instances, preserving their shape characteristics while introducing
diversity within the augmented area. We also show that the proposed method can
naturally be combined with text guidance and other image augmentation
techniques.
Related papers
- DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability [5.767984430681467]
This paper introduces DiffuMask-Editor, which combines the Diffusion Model for annotated datasets with Image Editing.
By integrating multiple objects into images using Text2Image models, our method facilitates the creation of more realistic datasets.
Results demonstrate that synthetic data generated by DiffuMask-Editor enable segmentation methods to achieve superior performance compared to real data.
arXiv Detail & Related papers (2024-11-04T05:39:01Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Paint by Inpaint: Learning to Add Image Objects by Removing Them First [8.399234415641319]
We train a diffusion model to inverse the inpainting process, effectively adding objects into images.
We provide detailed descriptions of the removed objects and a Large Language Model to convert these descriptions into diverse, natural-language instructions.
arXiv Detail & Related papers (2024-04-28T15:07:53Z) - Microscopy Image Segmentation via Point and Shape Regularized Data
Synthesis [9.47802391546853]
We develop a unified pipeline for microscopy image segmentation using synthetically generated training data.
Our framework achieves comparable results to models trained on authentic microscopy images with dense labels.
arXiv Detail & Related papers (2023-08-18T22:00:53Z) - Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling.
We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content.
We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - LayoutBERT: Masked Language Layout Model for Object Insertion [3.4806267677524896]
We propose layoutBERT for the object insertion task.
It uses a novel self-supervised masked language model objective and bidirectional multi-head self-attention.
We provide both qualitative and quantitative evaluations on datasets from diverse domains.
arXiv Detail & Related papers (2022-04-30T21:35:38Z) - BoundarySqueeze: Image Segmentation as Boundary Squeezing [104.43159799559464]
We propose a novel method for fine-grained high-quality image segmentation of both objects and scenes.
Inspired by dilation and erosion from morphological image processing techniques, we treat the pixel level segmentation problems as squeezing object boundary.
Our method yields large gains on COCO, Cityscapes, for both instance and semantic segmentation and outperforms previous state-of-the-art PointRend in both accuracy and speed under the same setting.
arXiv Detail & Related papers (2021-05-25T04:58:51Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.