Related papers: EarthGen: Generating the World from Top-Down Views

EarthGen: Generating the World from Top-Down Views

URL: http://arxiv.org/abs/2409.01491v2
Date: Sat, 7 Sep 2024 21:49:56 GMT
Title: EarthGen: Generating the World from Top-Down Views
Authors: Ansh Sharma, Albert Xiao, Praneet Rathi, Rohit Kundu, Albert Zhai, Yuan Shen, Shenlong Wang,
Abstract summary: We present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom.
Score: 23.66194982885544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.

Related papers

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion [23.3834795181211]
We introduce Aerial-Earth3D, the largest 3D aerial dataset to date, consisting of 50k curated scenes (each measuring 600m x 600m) captured across the U.S. mainland.<n>Each scene provides pose-annotated multi-view images, depth maps, normals, semantic segmentation, and camera poses, with explicit quality control to ensure terrain diversity.<n>We propose EarthCrafter, a tailored framework for large-scale 3D Earth generation via sparse-decoupled latent diffusion.
arXiv Detail & Related papers (2025-07-22T12:46:48Z)
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints. We train this model on a collection of more than 60 million multi-view samples from publicly available datasets. We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z)
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images. We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z)
Can Location Embeddings Enhance Super-Resolution of Satellite Imagery? [2.3020018305241337]
Publicly available satellite imagery, such as Sentinel- 2, often lacks the spatial resolution required for accurate analysis of remote sensing tasks. We propose a novel super-resolution framework that enhances generalization by incorporating geographic context through location embeddings. We demonstrate the effectiveness of our method on the building segmentation task, showing significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2025-01-27T08:16:54Z)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z)
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation [12.588962705218103]
We introduce the Multi-Scale Diffusion (MSD) framework, a plug-and-play module that extends the existing panoramic image generation framework to multiple resolution levels. By utilizing gradient descent techniques, our method effectively incorporates structural information from low-resolution images into high-resolution outputs.
arXiv Detail & Related papers (2024-10-24T15:18:51Z)
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities [55.737060358043536]
We propose a novel multi-modal controllable procedural content generation method, named CityX. It enhances realistic, unbounded 3D city generation guided by multiple layout conditions, including OSM, semantic maps, and satellite images. Through this effective framework, CityX shows the potential to build an innovative ecosystem for 3D scene generation.
arXiv Detail & Related papers (2024-07-24T18:05:13Z)
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation [24.193486441413803]
We present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
arXiv Detail & Related papers (2024-05-22T12:07:47Z)
Generative Powers of Ten [60.6740997942711]
We present a method that uses a text-to-image model to generate consistent content across multiple image scales. We achieve this through a joint multi-scale diffusion sampling approach. Our method enables deeper levels of zoom than traditional super-resolution methods.
arXiv Detail & Related papers (2023-12-04T18:59:25Z)
Pushing the Limits of 3D Shape Generation at Scale [65.24420181727615]
We present a significant breakthrough in 3D shape generation by scaling it to unprecedented dimensions. We have developed a model with an astounding 3.6 billion trainable parameters, establishing it as the largest 3D shape generation model to date, named Argus-3D.
arXiv Detail & Related papers (2023-06-20T13:01:19Z)
T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning. The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z)
Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z)
InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images. We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z)
Procedural 3D Terrain Generation using Generative Adversarial Networks [0.0]
We use Generative Adversarial Networks (GAN) to yield realistic 3D environments based on the distribution of remotely sensed images of landscapes, captured by satellites or drones. We are able to construct 3D scenery consisting of a plausible height distribution and colorization, in relation to the remotely sensed landscapes provided during training.
arXiv Detail & Related papers (2020-10-13T14:15:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.