MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
- URL: http://arxiv.org/abs/2405.13570v3
- Date: Tue, 15 Oct 2024 07:42:36 GMT
- Title: MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
- Authors: Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou,
- Abstract summary: We present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level.
In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions.
Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
- Score: 24.193486441413803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
Related papers
- SounDiT: Geo-Contextual Soundscape-to-Landscape Generation [28.099729084181092]
We present a novel problem-Geo-Contextual Soundscape-to-Landscape (GeoS2L) generation.<n>GeoS2L aims to synthesize geographically realistic landscape images from environmental soundscapes.
arXiv Detail & Related papers (2025-05-19T05:47:13Z) - SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model [1.3700170633913733]
This paper proposes a simulator-conditioned scene generation engine based on world model.
By constructing a simulation system consistent with real-world scenes, simulation data and labels, which serve as the conditions for data generation in the world model, for any scenes can be collected.
Results show that these generated images significantly improve downstream perception models performance.
arXiv Detail & Related papers (2025-03-18T06:41:02Z) - EarthGen: Generating the World from Top-Down Views [23.66194982885544]
We present a novel method for extensive multi-scale generative terrain modeling.
At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions.
We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom.
arXiv Detail & Related papers (2024-09-02T23:17:56Z) - Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Generating Physically-Consistent Satellite Imagery for Climate Visualizations [53.61991820941501]
We train a generative adversarial network to create synthetic satellite imagery of future flooding and reforestation events.
A pure deep learning-based model can generate flood visualizations but hallucinates floods at locations that were not susceptible to flooding.
We publish our code and dataset for segmentation guided image-to-image translation in Earth observation.
arXiv Detail & Related papers (2021-04-10T15:00:15Z) - Boundary Regularized Building Footprint Extraction From Satellite Images
Using Deep Neural Network [6.371173732947292]
We propose a novel deep neural network, which enables to jointly detect building instance and regularize noisy building boundary shapes from a single satellite imagery.
Our model can accomplish multi-tasks of object localization, recognition, semantic labelling and geometric shape extraction simultaneously.
arXiv Detail & Related papers (2020-06-23T17:24:09Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z) - Local Class-Specific and Global Image-Level Generative Adversarial
Networks for Semantic-Guided Scene Generation [135.4660201856059]
We consider learning the scene generation in a local context, and design a local class-specific generative network with semantic maps as a guidance.
To learn more discrimi class-specific feature representations for the local generation, a novel classification module is also proposed.
Experiments on two scene image generation tasks show superior generation performance of the proposed model.
arXiv Detail & Related papers (2019-12-27T16:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.