X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using
CLIP and StableDiffusion
- URL: http://arxiv.org/abs/2212.03863v2
- Date: Wed, 31 May 2023 14:57:48 GMT
- Title: X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using
CLIP and StableDiffusion
- Authors: Hanqing Zhao and Dianmo Sheng and Jianmin Bao and Dongdong Chen and
Dong Chen and Fang Wen and Lu Yuan and Ce Liu and Wenbo Zhou and Qi Chu and
Weiming Zhang and Nenghai Yu
- Abstract summary: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation.
We revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models.
X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone.
- Score: 137.84635386962395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance
segmentation. By randomly pasting object instances onto new background images,
it creates new training data for free and significantly boosts the segmentation
performance, especially for rare object categories. Although diverse,
high-quality object instances used in Copy-Paste result in more performance
gain, previous works utilize object instances either from human-annotated
instance segmentation datasets or rendered from 3D object models, and both
approaches are too expensive to scale up to obtain good diversity. In this
paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot
recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion).
We demonstrate for the first time that using a text2image model to generate
images or zero-shot recognition model to filter noisily crawled images for
different object categories is a feasible way to make Copy-Paste truly
scalable. To make such success happen, we design a data acquisition and
processing framework, dubbed ``X-Paste", upon which a systematic study is
conducted. On the LVIS dataset, X-Paste provides impressive improvements over
the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it
archives +2.6 box AP and +2.1 mask AP gains on all classes and even more
significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code
and models are available at https://github.com/yoctta/XPaste.
Related papers
- IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - A High-Resolution Dataset for Instance Detection with Multi-View
Instance Capture [15.298790238028356]
Instance detection (InsDet) is a long-lasting problem in robotics and computer vision.
Current InsDet are too small in scale by today's standards.
We introduce a new InsDet dataset and protocol.
arXiv Detail & Related papers (2023-10-30T03:58:41Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - Self-Supervised Instance Segmentation by Grasping [84.2469669256257]
We learn a grasp segmentation model to segment the grasped object from before and after grasp images.
Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision.
We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
arXiv Detail & Related papers (2023-05-10T16:51:36Z) - Humans need not label more humans: Occlusion Copy & Paste for Occluded
Human Instance Segmentation [0.3867363075280543]
We propose Occlusion Copy & Paste to introduce occluded examples to models during training.
It improves instance segmentation performance on occluded scenarios for "free" just by leveraging on existing large-scale datasets.
In a principled study, we show whether various proposed add-ons to the copy & paste augmentation indeed contribute to better performance.
arXiv Detail & Related papers (2022-10-07T16:44:05Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - Reviving Iterative Training with Mask Guidance for Interactive
Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes.
We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps.
We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z) - Simple Copy-Paste is a Strong Data Augmentation Method for Instance
Segmentation [94.4931516162023]
We study the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image.
We find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines.
Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories.
arXiv Detail & Related papers (2020-12-13T22:59:45Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.