Zero-Shot Refinement of Buildings' Segmentation Models using SAM
- URL: http://arxiv.org/abs/2310.01845v2
- Date: Sun, 11 Feb 2024 14:28:09 GMT
- Title: Zero-Shot Refinement of Buildings' Segmentation Models using SAM
- Authors: Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor and Ali
J. Ghandour
- Abstract summary: We present a novel approach to adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM)
SAM does not offer recognition abilities and thus fails to classify and tag localized objects.
This novel approach augments SAM with recognition abilities, a first of its kind.
- Score: 6.110856077714895
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models have excelled in various tasks but are often evaluated on
general benchmarks. The adaptation of these models for specific domains, such
as remote sensing imagery, remains an underexplored area. In remote sensing,
precise building instance segmentation is vital for applications like urban
planning. While Convolutional Neural Networks (CNNs) perform well, their
generalization can be limited. For this aim, we present a novel approach to
adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM), a
potent foundation model renowned for its prowess in class-agnostic image
segmentation capabilities. We start by identifying the limitations of SAM,
revealing its suboptimal performance when applied to remote sensing imagery.
Moreover, SAM does not offer recognition abilities and thus fails to classify
and tag localized objects. To address these limitations, we introduce different
prompting strategies, including integrating a pre-trained CNN as a prompt
generator. This novel approach augments SAM with recognition abilities, a first
of its kind. We evaluated our method on three remote sensing datasets,
including the WHU Buildings dataset, the Massachusetts Buildings dataset, and
the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU
dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in
F1-score. For in-distribution performance on the WHU dataset, we observe a
2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score,
respectively. Our code is publicly available at this Repo
(https://github.com/geoaigroup/GEOAI-ECRS2023), hoping to inspire further
exploration of foundation models for domain-specific tasks within the remote
sensing community.
Related papers
- Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model [32.03242732902217]
This paper addresses the challenge of adapting a model trained on source domain data to target domain samples.
We propose a joint-optimized adversarial network incorporating the "Segment Anything Model (SAM) (SAM-JOANet)"
arXiv Detail & Related papers (2024-11-08T02:15:20Z) - Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - Offshore Wind Plant Instance Segmentation Using Sentinel-1 Time Series,
GIS, and Semantic Segmentation Models [0.3413711585591077]
This study aims to detect offshore wind plants at an instance level using semantic segmentation models and Sentinel-1 time series.
LinkNet was the top-performing model, followed by U-Net++ and U-Net, while FPN and DeepLabv3+ presented the worst results.
arXiv Detail & Related papers (2023-12-14T09:49:15Z) - Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z) - Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models [7.452422412106768]
We propose a novel method named Text2Seg for remote sensing semantic segmentation.
It overcomes the dependency on extensive annotations by employing an automatic prompt generation process.
We show that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model.
arXiv Detail & Related papers (2023-04-20T18:39:41Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Dynamic Refinement Network for Oriented and Densely Packed Object
Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH)
Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner.
We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.