GEM: Boost Simple Network for Glass Surface Segmentation via Segment
Anything Model and Data Synthesis
- URL: http://arxiv.org/abs/2401.15282v1
- Date: Sat, 27 Jan 2024 03:36:47 GMT
- Title: GEM: Boost Simple Network for Glass Surface Segmentation via Segment
Anything Model and Data Synthesis
- Authors: Jing Hao, Moyun Liu, Kuo Feng Hung
- Abstract summary: We show how to segment glass surfaces with higher accuracy using two visual foundation models: Segment Anything (SAM) and Stable Diffusion.
We also propose a Synthetic but large-scale Glass Surface Detection dataset dubbed S-GSD via diffusion model with four different scales.
This dataset is a feasible source for transfer learning. The scale of synthetic data has positive impacts on transfer learning, while the improvement will gradually as the amount of data increases.
- Score: 3.97478982737167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting glass regions is a challenging task due to the ambiguity of their
transparency and reflection properties. These transparent glasses share the
visual appearance of both transmitted arbitrary background scenes and reflected
objects, thus having no fixed patterns.Recent visual foundation models, which
are trained on vast amounts of data, have manifested stunning performance in
terms of image perception and image generation. To segment glass surfaces with
higher accuracy, we make full use of two visual foundation models: Segment
Anything (SAM) and Stable Diffusion.Specifically, we devise a simple glass
surface segmentor named GEM, which only consists of a SAM backbone, a simple
feature pyramid, a discerning query selection module, and a mask decoder. The
discerning query selection can adaptively identify glass surface features,
assigning them as initialized queries in the mask decoder. We also propose a
Synthetic but photorealistic large-scale Glass Surface Detection dataset dubbed
S-GSD via diffusion model with four different scales, which contain 1x, 5x,
10x, and 20x of the original real data size. This dataset is a feasible source
for transfer learning. The scale of synthetic data has positive impacts on
transfer learning, while the improvement will gradually saturate as the amount
of data increases. Extensive experiments demonstrate that GEM achieves a new
state-of-the-art on the GSD-S validation set (IoU +2.1%). Codes and datasets
are available at: https://github.com/isbrycee/GEM-Glass-Segmentor.
Related papers
- Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models [32.57246173437492]
This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs.
By analyzing object differences between similar images, we challenge models to identify both matching and distinct components.
We utilize the Stable-Diffusion-XL model and advanced image editing techniques to create pairs of similar images that highlight object replacements.
arXiv Detail & Related papers (2024-08-08T17:10:16Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Glass Segmentation with Multi Scales and Primary Prediction Guiding [2.66512000865131]
Glass-like objects can be seen everywhere in our daily life which are hard for existing methods to segment them.
We propose MGNet, which consists of a FineRescaling and Merging module (FRM) to improve the ability to extract semantics.
We supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps.
arXiv Detail & Related papers (2024-02-13T16:14:32Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models [7.423981028880871]
Glass surface detection is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics.
We propose to address these issues by fully harnessing the capabilities of two existing vision foundation models (VFMs): Stable Diffusion and Segment Anything Model (SAM)
Our GEM establishes a new state-of-the-art performance with the help of these two VFMs, surpassing the best-reported method GlassSemNet with an IoU improvement of 2.1%.
arXiv Detail & Related papers (2023-07-22T08:37:23Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models [7.452422412106768]
We propose a novel method named Text2Seg for remote sensing semantic segmentation.
It overcomes the dependency on extensive annotations by employing an automatic prompt generation process.
We show that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model.
arXiv Detail & Related papers (2023-04-20T18:39:41Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning.
In particular, we first propose a novel refined differential module for generating finer boundary cues.
An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.