Related papers: CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation

CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation

URL: http://arxiv.org/abs/2507.00356v1
Date: Tue, 01 Jul 2025 01:05:18 GMT
Title: CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation
Authors: Zhiwei Yi, Xin Cheng, Jingyu Ma, Ruifei Zhu, Junwei Tian, Yuanxiu Zhou, Xinge Zhao, Hongzhe Li,
Abstract summary: Jilin-1 is the world's largest sub-meter-level commercial RS satellite constellation.<n>This study proposes CGEarthEye, a RSVFM framework specifically designed for Jilin-1 satellite characteristics.
Score: 3.5464435279468907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning methods have significantly advanced the development of intelligent rinterpretation in remote sensing (RS), with foundational model research based on large-scale pre-training paradigms rapidly reshaping various domains of Earth Observation (EO). However, compared to the open accessibility and high spatiotemporal coverage of medium-resolution data, the limited acquisition channels for ultra-high-resolution optical RS imagery have constrained the progress of high-resolution remote sensing vision foundation models (RSVFM). As the world's largest sub-meter-level commercial RS satellite constellation, the Jilin-1 constellation possesses abundant sub-meter-level image resources. This study proposes CGEarthEye, a RSVFM framework specifically designed for Jilin-1 satellite characteristics, comprising five backbones with different parameter scales with totaling 2.1 billion parameters. To enhance the representational capacity of the foundation model, we developed JLSSD, the first 15-million-scale multi-temporal self-supervised learning (SSL) dataset featuring global coverage with quarterly temporal sampling within a single year, constructed through multi-level representation clustering and sampling strategies. The framework integrates seasonal contrast, augmentation-based contrast, and masked patch token contrastive strategies for pre-training. Comprehensive evaluations across 10 benchmark datasets covering four typical RS tasks demonstrate that the CGEarthEye consistently achieves state-of-the-art (SOTA) performance. Further analysis reveals CGEarthEye's superior characteristics in feature visualization, model convergence, parameter efficiency, and practical mapping applications. This study anticipates that the exceptional representation capabilities of CGEarthEye will facilitate broader and more efficient applications of Jilin-1 data in traditional EO application.

Related papers

FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery [8.62554606349568]
FUSAR-GPT is a VLM specifically for Synthetic Aperture Radar (SAR) applications.<n>It embeds multi-source remote-sensing temporal features into the model's visual backbone via'spatiotemporal anchors'<n>It achieves state-of-the-art performance across several typical remote sensing visual-language benchmark tests.
arXiv Detail & Related papers (2026-02-22T13:40:17Z)
SOMA-1M: A Large-Scale SAR-Optical Multi-resolution Alignment Dataset for Multi-Task Remote Sensing [11.908437730011899]
SOMA-1M is a pixel-level precisely aligned dataset containing over 1.3 million pairs of georeferenced images with a specification of 512 x 512 pixels.<n>This dataset integrates imagery from Sentinel-1, PIESAT-1, Capella Space, and Google Earth, achieving global multi-scale coverage from 0.5 m to 10 m.<n>Based on this dataset, we established comprehensive evaluation benchmarks for four hierarchical vision tasks, including image matching, image fusion, SAR-assisted cloud removal, and cross-modal translation.
arXiv Detail & Related papers (2026-02-05T09:39:49Z)
FUSAR-KLIP: Towards Multimodal Foundation Models for Remote Sensing [16.948824707021412]
Cross-modal artificial intelligence has garnered widespread attention in recent years, achieving significant progress in the study of natural images.<n>Existing methods are mostly designed for RGB imagery, leaving a significant gap in modeling synthetic aperture radar (SAR) imagery.<n>This paper proposes FUSAR-KLIP, the first universal SAR multimodal foundational model, along with reusable data and evaluation baselines.
arXiv Detail & Related papers (2025-09-28T15:03:25Z)
Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z)
STAR: A Benchmark for Astronomical Star Fields Super-Resolution [51.79340280382437]
We propose STAR, a large-scale astronomical SR dataset containing 54,738 flux-consistent star field image pairs.<n>We propose a Flux-Invariant Super Resolution (FISR) model that could accurately infer the flux-consistent high-resolution images from input photometry.
arXiv Detail & Related papers (2025-07-22T09:28:28Z)
TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis [0.24199968850337347]
We present TESSERA, a novel Remote Sensing Foundation Model (RSFM) that uses Self-Supervised Learning (SSL) to generate global, robust representations at 10m scale from pixel-level satellite time series data.<n>Our results show that TESSERA outperforms both traditional RS baselines and the leading geospatial foundation models in diverse downstream tasks.
arXiv Detail & Related papers (2025-06-25T12:46:26Z)
Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition [0.0]
We investigate whether foundation models pretrained on remote sensing and general vision datasets can be effectively combined to improve performance.<n>The results show that feature-level ensembling of smaller pretrained models can match or exceed the performance of much larger models.<n>The study highlights the potential of applying knowledge distillation to transfer the strengths of ensembles into more compact models.
arXiv Detail & Related papers (2025-06-25T07:02:42Z)
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z)
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM [103.7537991413311]
Earth Observation (EO) data analysis is vital for monitoring environmental and human dynamics.<n>Recent Multimodal Large Language Models (MLLMs) show potential in EO understanding but remain restricted to single-sensor inputs.<n>We propose EarthMind, a unified vision-language framework that handles both single- and cross-sensor inputs.
arXiv Detail & Related papers (2025-06-02T13:36:05Z)
From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation [14.556499156486066]
Remote sensing images (RSIs) capture both natural and human-induced changes on the Earth's surface.<n>Semantic segmentation (SS) of RSIs enables the fine-grained interpretation of surface features.<n>Deep learning (DL) has emerged as a transformative approach, enabling substantial advances in remote sensing image semantic segmentation.
arXiv Detail & Related papers (2025-05-21T06:02:57Z)
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z)
SpectralEarth: Training Hyperspectral Foundation Models at Scale [47.93167977587301]
We introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models. We pretrain a series of foundation models on SpectralEarth using state-of-the-art self-supervised learning (SSL) algorithms. We construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation.
arXiv Detail & Related papers (2024-08-15T22:55:59Z)
SARATR-X: Toward Building A Foundation Model for SAR Target Recognition [22.770010893572973]
We make the first attempt towards building a foundation model for SAR ATR, termed SARATR-X.<n>SARATR-X learns generalizable representations via self-supervised learning (SSL) and provides a cornerstone for label-efficient model adaptation to generic SAR target detection and classification tasks.<n>SARATR-X is trained on 0.18 M unlabelled SAR target samples, which are curated by combining contemporary benchmarks and constitute the largest publicly available dataset till now.
arXiv Detail & Related papers (2024-05-15T14:17:44Z)
Exploiting Self-Supervised Constraints in Image Super-Resolution [72.35265021054471]
This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential moving average to enhance stability. Empirical evaluations reveal that our SSC-SR framework delivers substantial enhancements on a variety of benchmark datasets, achieving an average increase of 0.1 dB over EDSR and 0.06 dB over SwinIR.
arXiv Detail & Related papers (2024-03-30T06:18:50Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone [13.09940764764909]
We propose a framework to fulfill semantic interpretation of individual buildings with high-resolution optical satellite imagery. Specifically, the leveraged domain adapted pretraining strategy and composite dual-backbone greatly facilitates the discnative feature learning. Experiment results show that our approach ranks in the first place of the 2023 IEEE GRSS Data Fusion Contest.
arXiv Detail & Related papers (2023-08-10T05:54:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.