COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design
- URL: http://arxiv.org/abs/2603.03239v1
- Date: Tue, 03 Mar 2026 18:31:46 GMT
- Title: COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design
- Authors: Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski,
- Abstract summary: Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover products.<n>We introduce COP-GEN, a latent diffusion transformer that models the joint distribution of heterogeneous Earth Observation modalities at their native spatial resolutions.<n>Experiments on a large-scale global multimodal dataset show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities.
- Score: 9.278432103577925
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover products. Relationships between these modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations. Thus, such conditional mappings should be parametrised as data distributions. As a result, deterministic models tend to collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous Earth Observation modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation, spectral band infilling, and generation under partial or missing inputs, without task-specific retraining. Experiments on a large-scale global multimodal dataset show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and systematically adapts its output uncertainty as conditioning information increases. These results highlight the practical importance of stochastic generative modeling for Earth observation and motivate evaluation protocols that move beyond single-reference, pointwise metrics. Website: https:// miquel-espinosa.github.io/cop-gen
Related papers
- GeodesicNVS: Probability Density Geodesic Flow Matching for Novel View Synthesis [54.39598154430305]
We propose a Data-to-Data Flow Matching framework that learns deterministic transformations directly between paired views.<n>PDG-FM constrains flow trajectories using geodesic interpolants derived from probability density metrics of pretrained diffusion models.<n>These results highlight the advantages of incorporating data-dependent geometric regularization into deterministic flow matching for consistent novel view generation.
arXiv Detail & Related papers (2026-03-01T09:30:11Z) - Leveraging generative adversarial networks with spatially adaptive denormalization for multivariate stochastic seismic data inversion [0.0]
We propose an iterative geostatistical inversion algorithm, SPADE-GANInv, for the prediction of facies and multiple correlated continuous properties from seismic data.<n>The SPADE-GAN is trained to reproduce realistic geometries, while sequential co-simulation predicts the spatial variability of the facies-dependent continuous properties.<n>The method is demonstrated on both 2-D synthetic scenarios and field data, targeting the prediction of facies, porosity, and acoustic impedance from full-stack seismic data.
arXiv Detail & Related papers (2025-12-02T15:25:22Z) - Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction [20.1863553357121]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies various dense prediction tasks and diverse semantic class predictions.
arXiv Detail & Related papers (2025-05-18T07:39:17Z) - Language-Informed Hyperspectral Image Synthesis for Imbalanced-Small Sample Classification via Semi-Supervised Conditional Diffusion Model [1.9746060146273674]
This paper proposes Txt2HSI-LDM(VAE), a novel language-informed hyperspectral image synthesis method.<n>To address the high-dimensionality of hyperspectral data, a universal variational autoencoder (VAE) is designed to map the data into a low-dimensional latent space.<n>VAE decodes HSI from latent space generated by the diffusion model with the language conditions as input.
arXiv Detail & Related papers (2025-02-27T02:35:49Z) - PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [83.35198885088093]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces [19.92604781654767]
Energy-based models (EBMs) offer a flexible framework for probabilistic modelling across various data domains.<n>We propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points.
arXiv Detail & Related papers (2024-12-02T00:35:29Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification [42.15709954199397]
A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper.
First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data.
A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
arXiv Detail & Related papers (2023-11-17T04:06:20Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - OR-Net: Pointwise Relational Inference for Data Completion under Partial
Observation [51.083573770706636]
This work uses relational inference to fill in the incomplete data.
We propose Omni-Relational Network (OR-Net) to model the pointwise relativity in two aspects.
arXiv Detail & Related papers (2021-05-02T06:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.