Zero-Shot Refinement of Buildings' Segmentation Models using SAM
- URL: http://arxiv.org/abs/2310.01845v2
- Date: Sun, 11 Feb 2024 14:28:09 GMT
- Title: Zero-Shot Refinement of Buildings' Segmentation Models using SAM
- Authors: Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor and Ali
J. Ghandour
- Abstract summary: We present a novel approach to adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM)
SAM does not offer recognition abilities and thus fails to classify and tag localized objects.
This novel approach augments SAM with recognition abilities, a first of its kind.
- Score: 6.110856077714895
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models have excelled in various tasks but are often evaluated on
general benchmarks. The adaptation of these models for specific domains, such
as remote sensing imagery, remains an underexplored area. In remote sensing,
precise building instance segmentation is vital for applications like urban
planning. While Convolutional Neural Networks (CNNs) perform well, their
generalization can be limited. For this aim, we present a novel approach to
adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM), a
potent foundation model renowned for its prowess in class-agnostic image
segmentation capabilities. We start by identifying the limitations of SAM,
revealing its suboptimal performance when applied to remote sensing imagery.
Moreover, SAM does not offer recognition abilities and thus fails to classify
and tag localized objects. To address these limitations, we introduce different
prompting strategies, including integrating a pre-trained CNN as a prompt
generator. This novel approach augments SAM with recognition abilities, a first
of its kind. We evaluated our method on three remote sensing datasets,
including the WHU Buildings dataset, the Massachusetts Buildings dataset, and
the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU
dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in
F1-score. For in-distribution performance on the WHU dataset, we observe a
2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score,
respectively. Our code is publicly available at this Repo
(https://github.com/geoaigroup/GEOAI-ECRS2023), hoping to inspire further
exploration of foundation models for domain-specific tasks within the remote
sensing community.
Related papers
- iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning [22.14627083675405]
We propose incremental neural mesh models that can be extended with new meshes over time.
We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets.
Our work also presents the first incremental learning approach for pose estimation.
arXiv Detail & Related papers (2024-07-12T13:57:49Z) - Offshore Wind Plant Instance Segmentation Using Sentinel-1 Time Series,
GIS, and Semantic Segmentation Models [0.3413711585591077]
This study aims to detect offshore wind plants at an instance level using semantic segmentation models and Sentinel-1 time series.
LinkNet was the top-performing model, followed by U-Net++ and U-Net, while FPN and DeepLabv3+ presented the worst results.
arXiv Detail & Related papers (2023-12-14T09:49:15Z) - Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z) - Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z) - A Billion-scale Foundation Model for Remote Sensing Images [5.065947993017157]
Three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters.
This paper examines the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks.
To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field.
arXiv Detail & Related papers (2023-04-11T13:33:45Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Decoupled Self Attention for Accurate One Stage Object Detection [4.791635488070342]
A decoupled self attention(DSA) module is proposed for one stage object detection models in this paper.
Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models.
arXiv Detail & Related papers (2020-12-14T15:19:30Z) - Dynamic Refinement Network for Oriented and Densely Packed Object
Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH)
Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner.
We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.