Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts
- URL: http://arxiv.org/abs/2511.10300v1
- Date: Fri, 14 Nov 2025 01:44:21 GMT
- Title: Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts
- Authors: Sumin Lee, Sungwon Park, Jeasurk Yang, Jihee Kim, Meeyoung Cha,
- Abstract summary: GRAM is a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions.<n>We use a million-scale satellite imagery dataset from 12 cities across four continents for source training.<n>During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions.
- Score: 20.100765943688454
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major challenge, hindering the ability of models trained on specific regions to generalize effectively to unseen locations. To address this, we introduce a large-scale high-resolution dataset and propose GRAM (Generalized Region-Aware Mixture-of-Experts), a two-phase test-time adaptation framework that enables robust slum segmentation without requiring labeled data from target regions. We compile a million-scale satellite imagery dataset from 12 cities across four continents for source training. Using this dataset, the model employs a Mixture-of-Experts architecture to capture region-specific slum characteristics while learning universal features through a shared backbone. During adaptation, prediction consistency across experts filters out unreliable pseudo-labels, allowing the model to generalize effectively to previously unseen regions. GRAM outperforms state-of-the-art baselines in low-resource settings such as African cities, offering a scalable and label-efficient solution for global slum mapping and data-driven urban planning.
Related papers
- AINet: Anchor Instances Learning for Regional Heterogeneity in Whole Slide Image [61.54860340942449]
We introduce a novel concept of anchor instance (AI), a compact subset of instances that are representative within their regions (local) and discriminative at the bag (global) level.<n>These AIs act as semantic references to guide interactions across regions, correcting non-discriminative patterns while preserving regional diversity.<n>We develop a concise yet effective framework, AINet, which employs a simple predictor and surpasses state-of-the-art methods with substantially fewer FLOPs and parameters.
arXiv Detail & Related papers (2026-02-21T09:36:27Z) - Urban-R1: Reinforced MLLMs Mitigate Geospatial Biases for Urban General Intelligence [64.36291202666212]
Urban General Intelligence (UGI) refers to AI systems that can understand and reason about complex urban environments.<n>Recent studies have built urban foundation models using supervised fine-tuning (SFT) of LLMs and MLLMs.<n>We propose Urban-R1, a reinforcement learning-based post-training framework that aligns MLLMs with the objectives of UGI.
arXiv Detail & Related papers (2025-10-18T15:59:09Z) - DeepC4: Deep Conditional Census-Constrained Clustering for Large-scale Multitask Spatial Disaggregation of Urban Morphology [0.7237068561453082]
We present a novel deep learning-based spatial disaggregation approach that incorporates local census statistics as cluster-level constraints.<n>Our work has offered a new deep learning-based mapping technique towards a spatial auditing of our existing coarse-grained derived information at large scales.
arXiv Detail & Related papers (2025-07-30T10:25:39Z) - Synthetic Data Matters: Re-training with Geo-typical Synthetic Labels for Building Detection [13.550020274133866]
We propose re-training models at test time using synthetic data tailored to the target region's city layout.<n>This method generates geo-typical synthetic data that closely replicates the urban structure of a target area.<n>Experiments demonstrate significant performance enhancements, with median improvements of up to 12%, depending on the domain gap.
arXiv Detail & Related papers (2025-07-22T14:53:13Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z) - Geographical Context Matters: Bridging Fine and Coarse Spatial Information to Enhance Continental Land Cover Mapping [2.9212099078191756]
BRIDGE-LC is a novel deep learning framework that integrates multi-scale geospatial information into the land cover classification process.<n>Our results demonstrate that integrating geospatial information improves land cover mapping performance.
arXiv Detail & Related papers (2025-04-16T17:42:46Z) - CV-Cities: Advancing Cross-View Geo-Localization in Global Cities [3.074201632920997]
Cross-view geo-localization (CVGL) involves matching and retrieving satellite images to determine the geographic location of a ground image.
This task faces significant challenges due to substantial viewpoint discrepancies, the complexity of localization scenarios, and the need for global localization.
We propose a novel CVGL framework that integrates the foundational model DINOv2 with an advanced feature mixer.
arXiv Detail & Related papers (2024-11-19T11:41:22Z) - Cross Pseudo Supervision Framework for Sparsely Labelled Geospatial Images [0.0]
Land Use Land Cover (LULC) mapping is a vital tool for urban and resource planning.
This study introduces a semi-supervised segmentation model for LULC prediction using high-resolution satellite images.
We propose a modified Cross Pseudo Supervision framework to train image segmentation models on sparsely labelled data.
arXiv Detail & Related papers (2024-08-05T11:14:23Z) - Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning for Robust Forecasting and Security [12.8405655328298]
Existing methods often struggle with issues such as noise, data incompleteness, and security vulnerabilities.<n>This paper proposes a novel framework, Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning (EUPAS)<n>EUPAS ensures robust performance across various forecasting tasks such as crime prediction, check-in prediction, and land use classification.
arXiv Detail & Related papers (2024-02-02T06:06:45Z) - Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks [82.82866901799565]
We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task.
Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments.
HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
arXiv Detail & Related papers (2023-09-26T23:55:39Z) - Activation Regression for Continuous Domain Generalization with
Applications to Crop Classification [48.795866501365694]
Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions.
We model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem.
We develop a dataset spatially distributed across the entire continental United States.
arXiv Detail & Related papers (2022-04-14T15:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.