Improving Full Waveform Inversion in Large Model Era
- URL: http://arxiv.org/abs/2603.00377v1
- Date: Fri, 27 Feb 2026 23:33:06 GMT
- Title: Improving Full Waveform Inversion in Large Model Era
- Authors: Yinan Feng, Peng Jin, Yuzhe Guo, Yinpeng Chen, Youzuo Lin,
- Abstract summary: We show that a model trained entirely on simulated and relatively simple data can generalize remarkably well to challenging geological benchmarks.<n>Our model achieves state-of-the-art performance on OpenFWI and significantly narrows the generalization gap in data-driven FWI.
- Score: 25.26004497243484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Full Waveform Inversion (FWI) is a highly nonlinear and ill-posed problem that aims to recover subsurface velocity maps from surface-recorded seismic waveforms data. Existing data-driven FWI typically uses small models, as available datasets have limited volume, geological diversity, and spatial extent, leading to substantial concerns about overfitting. Although they perform well on synthetic datasets, current methods fail to generalize to more realistic geological structures. In this work, we show that a model trained entirely on simulated and relatively simple data can generalize remarkably well to challenging and unseen geological benchmarks. We provide a working recipe that tames a billion-parameter model for FWI through coordinated scaling across three axes: model capacity, data diversity, and training strategy. Our model achieves state-of-the-art performance on OpenFWI and significantly narrows the generalization gap in data-driven FWI. Across six challenging geophysical benchmarks, including Marmousi, 2D SEG/EAGE Salt and Overthrust, 2004 BP, Sigsbee, and SEAM Phase I, it infers complex structures absent from the training set and delivers significant performance improvements (SSIM from 0.5844 to 0.7669). Overall, our results demonstrate that with an appropriate scaling strategy, large models trained on simple synthetic data can achieve substantial generalization to more complex and realistic geological structures.
Related papers
- UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass [83.7071371474926]
UniSH is a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction.<n>Our framework bridges strong, disparate priors from scene reconstruction and HMR.<n>Our model achieves state-of-the-art performance on human-centric scene reconstruction.
arXiv Detail & Related papers (2026-01-03T16:06:27Z) - HD-GEN: A High-Performance Software System for Human Mobility Data Generation Based on Patterns of Life [1.9739979974462676]
We introduce a comprehensive software pipeline for calibrating, generating, processing, and visualizing large-scale individual-level human mobility datasets.<n>A data generation engine constructs geographically grounded simulations using OpenStreetMap data.<n>A genetic algorithm-based calibration module fine-tunes simulation parameters to align with real-world mobility characteristics.<n>A data processing suite transforms raw simulation logs into structured formats suitable for downstream applications.
arXiv Detail & Related papers (2026-01-03T16:01:00Z) - ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion [48.540756751934836]
ReconMOST is a data-driven guided diffusion model framework for multi-layer sea temperature reconstruction.<n>Our method extends ML-based SST reconstruction to a global, multi-layer setting, handling over 92.5% missing data.
arXiv Detail & Related papers (2025-06-12T06:27:22Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior [11.859145373647474]
We present the first large-scale benchmarking study designed to provide guidelines for domain shift strategies in seismic interpretation.<n>Our benchmark spans over 200 combinations of model architectures, datasets and training strategies, across three datasets.<n>Our analysis shows that common fine-tuning practices can lead to catastrophic forgetting when source and target datasets are disjoint.
arXiv Detail & Related papers (2025-05-13T13:56:43Z) - World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks [53.98633183204453]
In this paper, a novel world model-based learning framework is proposed to minimize packet-completeness-aware age of information (CAoI) in a vehicular network.<n>A world model framework is proposed to jointly learn a dynamic model of the mmWave V2X environment and use it to imagine trajectories for learning how to perform link scheduling.<n>In particular, the long-term policy is learned in differentiable imagined trajectories instead of environment interactions.
arXiv Detail & Related papers (2025-05-03T06:23:18Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection [11.868792440783054]
This paper proposes Seismic Fault SAM, which applies the general pre-training foundation model-Segment Anything Model (SAM)-to seismic fault interpretation.
Our innovative points include designing lightweight Adapter modules, freezing most of the pre-training weights, and only updating a small number of parameters.
Experimental results on the largest publicly available seismic dataset, Thebe, show that our method surpasses existing 3D models on both OIS and ODS metrics.
arXiv Detail & Related papers (2024-07-19T08:38:48Z) - Physics-Consistent Data-driven Waveform Inversion with Adaptive Data
Augmentation [12.564534712461331]
We develop a new hybrid computational approach to solve full-waveform inversion (FWI)
We develop a data augmentation strategy that can improve the representativity of the training set.
We apply our method to synthetic elastic seismic waveform data generated from a subsurface geologic model built on a carbon sequestration site at Kimberlina, California.
arXiv Detail & Related papers (2020-09-03T17:12:55Z) - MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep
Learning on Map Images [4.7791671364702575]
Land-use regression models are important for the assessment of air pollution concentrations in areas without measurement stations.
We propose the Data-driven, Open, Global (DOG) paradigm that entails models based on purely data-driven approaches using only openly and globally available data.
arXiv Detail & Related papers (2020-02-18T11:21:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.