X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using
CLIP and StableDiffusion
- URL: http://arxiv.org/abs/2212.03863v2
- Date: Wed, 31 May 2023 14:57:48 GMT
- Title: X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using
CLIP and StableDiffusion
- Authors: Hanqing Zhao and Dianmo Sheng and Jianmin Bao and Dongdong Chen and
Dong Chen and Fang Wen and Lu Yuan and Ce Liu and Wenbo Zhou and Qi Chu and
Weiming Zhang and Nenghai Yu
- Abstract summary: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation.
We revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models.
X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone.
- Score: 137.84635386962395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance
segmentation. By randomly pasting object instances onto new background images,
it creates new training data for free and significantly boosts the segmentation
performance, especially for rare object categories. Although diverse,
high-quality object instances used in Copy-Paste result in more performance
gain, previous works utilize object instances either from human-annotated
instance segmentation datasets or rendered from 3D object models, and both
approaches are too expensive to scale up to obtain good diversity. In this
paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot
recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion).
We demonstrate for the first time that using a text2image model to generate
images or zero-shot recognition model to filter noisily crawled images for
different object categories is a feasible way to make Copy-Paste truly
scalable. To make such success happen, we design a data acquisition and
processing framework, dubbed ``X-Paste", upon which a systematic study is
conducted. On the LVIS dataset, X-Paste provides impressive improvements over
the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it
archives +2.6 box AP and +2.1 mask AP gains on all classes and even more
significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code
and models are available at https://github.com/yoctta/XPaste.
Related papers
- SDI-Paste: Synthetic Dynamic Instance Copy-Paste for Video Instance Segmentation [26.258313321256097]
We leverage the recent growth in video fidelity of generative models to explore effective ways of incorporating synthetically generated objects into existing video datasets to artificially expand object instance pools.
We name our video data augmentation pipeline Synthetic Dynamic Instance Copy-Paste, and test it on the complex task of Video Instance detection, segmentation and tracking of object instances across a video sequence.
arXiv Detail & Related papers (2024-10-16T12:11:34Z) - A High-Resolution Dataset for Instance Detection with Multi-View
Instance Capture [15.298790238028356]
Instance detection (InsDet) is a long-lasting problem in robotics and computer vision.
Current InsDet are too small in scale by today's standards.
We introduce a new InsDet dataset and protocol.
arXiv Detail & Related papers (2023-10-30T03:58:41Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - Self-Supervised Instance Segmentation by Grasping [84.2469669256257]
We learn a grasp segmentation model to segment the grasped object from before and after grasp images.
Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision.
We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
arXiv Detail & Related papers (2023-05-10T16:51:36Z) - Humans need not label more humans: Occlusion Copy & Paste for Occluded
Human Instance Segmentation [0.3867363075280543]
We propose Occlusion Copy & Paste to introduce occluded examples to models during training.
It improves instance segmentation performance on occluded scenarios for "free" just by leveraging on existing large-scale datasets.
In a principled study, we show whether various proposed add-ons to the copy & paste augmentation indeed contribute to better performance.
arXiv Detail & Related papers (2022-10-07T16:44:05Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - INSTA-YOLO: Real-Time Instance Segmentation [2.726684740197893]
We propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation.
The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with regression in the localization head.
We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus.
arXiv Detail & Related papers (2021-02-12T21:17:29Z) - Simple Copy-Paste is a Strong Data Augmentation Method for Instance
Segmentation [94.4931516162023]
We study the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image.
We find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines.
Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories.
arXiv Detail & Related papers (2020-12-13T22:59:45Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.