Related papers: The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface

The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface

URL: http://arxiv.org/abs/2509.02783v2
Date: Tue, 23 Sep 2025 16:43:24 GMT
Title: The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface
Authors: Arnab Mazumder, Javier E. Santos, Noah Hobbs, Mohamed Mehana, Daniel O'Malley,
Abstract summary: We present a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets.<n>The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality.<n>We include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness.
Score: 2.0912612079111814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality. This design enables the model to scale to an arbitrary number of modalities, making it straightforward to add new ones not considered in the initial design. We currently include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness. These capabilities support in-context learning, enabling the model to generate predictions either with no inputs or with an arbitrary number of additional observations from any subset of modalities. On validation data, this reduces errors in predicting stress angle by more than a factor of three. The proposed architecture is scalable and demonstrates improved performance with increased parameters. Together, these advances make the Transparent Earth an initial foundation model for the Earth's subsurface that ultimately aims to predict any subsurface property anywhere on Earth.

Related papers

Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis [51.37795317716487]
We propose the Adaptive Point-Prompt Tuning (APPT) method, which fine-tunes pre-trained models with a modest number of parameters.<n>We convert raw point clouds into point embeddings by aggregating local geometry to capture spatial features followed by linear layers.<n>To calibrate self-attention across source domains of any modality to 3D, we introduce a prompt generator that shares weights with the point embedding module.
arXiv Detail & Related papers (2025-08-30T06:02:21Z)
Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities. We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula. Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z)
(Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points. Our metric leads to the conceptual definition of generative distances and generative geodesics. Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z)
Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification [42.15709954199397]
A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
arXiv Detail & Related papers (2023-11-17T04:06:20Z)
T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning. The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
A machine learning and feature engineering approach for the prediction of the uncontrolled re-entry of space objects [1.0205541448656992]
We present the development of a deep learning model for the re-entry prediction of uncontrolled objects in Low Earth Orbit (LEO) The model is based on a modified version of the Sequence-to-Sequence architecture and is trained on the average altitude profile as derived from a set of Two-Line Element (TLE) data of over 400 bodies. The novelty of the work consists in introducing in the deep learning model, alongside the average altitude, three new input features: a drag-like coefficient (B*), the average solar index, and the area-to-mass ratio of the object.
arXiv Detail & Related papers (2023-03-17T13:53:59Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis [8.20832544370228]
We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers. Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
arXiv Detail & Related papers (2022-03-30T15:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.