Related papers: Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models

URL: http://arxiv.org/abs/2304.10597v1
Date: Thu, 20 Apr 2023 18:39:41 GMT
Title: Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models
Authors: Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, Sheng Li
Abstract summary: This study focuses on the remote sensing domain, where the images are notably dissimilar from those in conventional scenarios. We developed a pipeline that leverages multiple foundation models to facilitate remote sensing image semantic segmentation tasks guided by text prompt. The pipeline is benchmarked on several widely-used remote sensing datasets, and we present preliminary results to demonstrate its effectiveness.
Score: 5.360103006279672
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in foundation models (FMs), such as GPT-4 and LLaMA, have attracted significant attention due to their exceptional performance in zero-shot learning scenarios. Similarly, in the field of visual learning, models like Grounding DINO and the Segment Anything Model (SAM) have exhibited remarkable progress in open-set detection and instance segmentation tasks. It is undeniable that these FMs will profoundly impact a wide range of real-world visual learning tasks, ushering in a new paradigm shift for developing such models. In this study, we concentrate on the remote sensing domain, where the images are notably dissimilar from those in conventional scenarios. We developed a pipeline that leverages multiple FMs to facilitate remote sensing image semantic segmentation tasks guided by text prompt, which we denote as Text2Seg. The pipeline is benchmarked on several widely-used remote sensing datasets, and we present preliminary results to demonstrate its effectiveness. Through this work, we aim to provide insights into maximizing the applicability of visual FMs in specific contexts with minimal model tuning. The code is available at https://github.com/Douglas2Code/Text2Seg.

Related papers

RemoteSAM: Towards Segment Anything for Earth Observation [29.707796048411705]
We aim to develop a robust yet flexible visual foundation model for Earth observation.<n>It should possess strong capabilities in recognizing and localizing diverse visual targets.<n>We present RemoteSAM, a foundation model that establishes new SoTA on several earth observation perception benchmarks.
arXiv Detail & Related papers (2025-05-23T15:27:57Z)
AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images [21.294581646546124]
AerOSeg is a novel Open-Vocabulary (OVS) approach for remote sensing data. We compute robust image-text correlation features using rotated versions of the input image and domain-specific prompts. Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features. We enhance the refined correlation features using a multi-scale attention-aware composition to produce the final segmentation map.
arXiv Detail & Related papers (2025-04-12T13:06:46Z)
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation [16.58381088280145]
We introduce Open-Vocabulary Remote Sensing Image Semantic (OVRSISS) to segment arbitrary semantic classes in remote sensing images. To address the lack of OVRSISS datasets, we develop LandDiscover50K, a comprehensive dataset of 51,846 images covering 40 diverse semantic classes. In addition, we propose a novel framework named GSNet that integrates domain priors from special remote sensing models and versatile capabilities of general vision-language models.
arXiv Detail & Related papers (2024-12-27T07:20:30Z)
Learn from Foundation Model: Fruit Detection Model without Manual Annotation [8.486694650855949]
Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation.
arXiv Detail & Related papers (2024-11-25T08:52:46Z)
Generalized Few-Shot Semantic Segmentation in Remote Sensing: Challenge and Benchmark [18.636210870172675]
Few-shot semantic segmentation can encourage deep learning models to learn from few labelled examples for novel classes not seen during the training. The generalized few-shot segmentation setting has an additional challenge which encourages models not only to adapt to the novel classes but also to maintain strong performance on the training base classes. We release the dataset augmenting OpenEarthMap with additional classes labelled for the generalized few-shot evaluation setting.
arXiv Detail & Related papers (2024-09-17T14:20:47Z)
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z)
Zero-Shot Refinement of Buildings' Segmentation Models using SAM [6.110856077714895]
We present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM) SAM does not offer recognition abilities and thus fails to classify and tag localized objects. This novel approach augments SAM with recognition abilities, a first of its kind.
arXiv Detail & Related papers (2023-10-03T07:19:59Z)
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations. Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
Remote Sensing Images Semantic Segmentation with General Remote Sensing Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation. Specifically, the global style contrastive module is used to learn an image-level representation better. The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)
Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.