Related papers: Zero-Shot Refinement of Buildings' Segmentation Models using SAM

Zero-Shot Refinement of Buildings' Segmentation Models using SAM

URL: http://arxiv.org/abs/2310.01845v2
Date: Sun, 11 Feb 2024 14:28:09 GMT
Title: Zero-Shot Refinement of Buildings' Segmentation Models using SAM
Authors: Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor and Ali J. Ghandour
Abstract summary: We present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM) SAM does not offer recognition abilities and thus fails to classify and tag localized objects. This novel approach augments SAM with recognition abilities, a first of its kind.
Score: 6.110856077714895
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score, respectively. Our code is publicly available at this Repo (https://github.com/geoaigroup/GEOAI-ECRS2023), hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.

Related papers

Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
RemoteSAM: Towards Segment Anything for Earth Observation [29.707796048411705]
We aim to develop a robust yet flexible visual foundation model for Earth observation.<n>It should possess strong capabilities in recognizing and localizing diverse visual targets.<n>We present RemoteSAM, a foundation model that establishes new SoTA on several earth observation perception benchmarks.
arXiv Detail & Related papers (2025-05-23T15:27:57Z)
Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting [14.640695608089796]
Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain.<n>We for the first time explore the potentials of Segment Anything Model for SFDA via automatedly finding an accurate bounding box prompt.<n>We propose a novel Dual Feature Guided (DFG) auto-prompting approach to search for the box prompt.
arXiv Detail & Related papers (2025-05-13T13:00:48Z)
A Recipe for Improving Remote Sensing VLM Zero Shot Generalization [0.4427533728730559]
We present two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery with captions generated by Gemini using landmarks extracted from Google Maps. The second dataset utilizes public web images and their corresponding alt-text, filtered for the remote sensing domain.
arXiv Detail & Related papers (2025-03-10T21:09:02Z)
Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model [32.03242732902217]
This paper addresses the challenge of adapting a model trained on source domain data to target domain samples. We propose a joint-optimized adversarial network incorporating the "Segment Anything Model (SAM) (SAM-JOANet)"
arXiv Detail & Related papers (2024-11-08T02:15:20Z)
Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios. We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z)
Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments. We propose UOIS-SAM, a data-efficient solution for the UOIS task. UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z)
TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation [40.49924427388922]
We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS) We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts. Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
arXiv Detail & Related papers (2024-09-09T07:43:58Z)
Offshore Wind Plant Instance Segmentation Using Sentinel-1 Time Series, GIS, and Semantic Segmentation Models [0.3413711585591077]
This study aims to detect offshore wind plants at an instance level using semantic segmentation models and Sentinel-1 time series. LinkNet was the top-performing model, followed by U-Net++ and U-Net, while FPN and DeepLabv3+ presented the worst results.
arXiv Detail & Related papers (2023-12-14T09:49:15Z)
Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model. Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks. This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z)
Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models [7.452422412106768]
We propose a novel method named Text2Seg for remote sensing semantic segmentation. It overcomes the dependency on extensive annotations by employing an automatic prompt generation process. We show that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model.
arXiv Detail & Related papers (2023-04-20T18:39:41Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Dynamic Refinement Network for Oriented and Densely Packed Object Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH) Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner. We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.