RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
- URL: http://arxiv.org/abs/2601.01829v1
- Date: Mon, 05 Jan 2026 06:49:13 GMT
- Title: RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
- Authors: Peiyan Hu, Haodong Feng, Hongyuan Liu, Tongtong Yan, Wenhao Deng, Tianrun Gao, Rong Zheng, Haoren Zheng, Chenglei Yu, Chuanrui Wang, Kaiwen Li, Zhi-Ming Ma, Dezhi Zhou, Xingcai Lu, Dixia Fan, Tailin Wu,
- Abstract summary: We introduce RealPDEBench, the first benchmark for scientific Machine Learning (ML) that integrates real-world measurements with paired numerical simulations.<n>RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines.<n> Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence.
- Score: 25.53943767088309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.
Related papers
- D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping [66.22412592525369]
We introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine.<n>We show that our engine achieves accurate and robust performance in mass identification across various object geometries and mass values.<n>Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance in object grasping.
arXiv Detail & Related papers (2026-03-01T15:32:04Z) - SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios [71.65387146697319]
Large language models (LLMs) have been extensively studied for tasks like math competitions, complex coding, and scientific reasoning.<n>We propose SimuScene, the first systematic study that trains and evaluates LLMs on simulating physical scenarios.<n>We build an automatic pipeline to collect data, with human verification to ensure quality.
arXiv Detail & Related papers (2026-02-11T13:26:02Z) - Mirage2Matter: A Physically Grounded Gaussian World Model from Video [87.9732484393686]
We present Simulate Anything, a graphics-driven world modeling and simulation framework.<n>Our approach reconstructs real-world environments into a photorealistic scene representation using 3D Gaussian Splatting (3DGS)<n>We then leverage generative models to recover a physically realistic representation and integrate it into a simulation environment via a precision calibration target.
arXiv Detail & Related papers (2026-01-24T07:43:57Z) - HD-GEN: A High-Performance Software System for Human Mobility Data Generation Based on Patterns of Life [1.9739979974462676]
We introduce a comprehensive software pipeline for calibrating, generating, processing, and visualizing large-scale individual-level human mobility datasets.<n>A data generation engine constructs geographically grounded simulations using OpenStreetMap data.<n>A genetic algorithm-based calibration module fine-tunes simulation parameters to align with real-world mobility characteristics.<n>A data processing suite transforms raw simulation logs into structured formats suitable for downstream applications.
arXiv Detail & Related papers (2026-01-03T16:01:00Z) - SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors [58.87134689752605]
We introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation.<n>We show that even the best LLMs today have limited simulation ability (score: 40.80/100), performance scales log-linearly with model size.<n>We demonstrate that simulation ability correlates most strongly with deep, knowledge-intensive reasoning.
arXiv Detail & Related papers (2025-10-20T13:14:38Z) - High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception [3.1508266388327324]
This paper proposes a high-fidelity digital twin (HiFi DT) framework that incorporates real-world background geometry, lane-level road topology, and sensor-specific specifications and placement.<n>Experiments show that the DT-trained model outperforms the equivalent model trained on real data by 4.8%.
arXiv Detail & Related papers (2025-09-03T00:12:58Z) - Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning [0.15469999759898032]
PLAID is a framework for representing and sharing datasets of physics simulations.<n> PLAID defines a unified standard for describing simulation data.<n>We release six datasets under the PLAID standard, covering structural mechanics and computational fluid dynamics.
arXiv Detail & Related papers (2025-05-05T18:59:17Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - MBDS: A Multi-Body Dynamics Simulation Dataset for Graph Networks Simulators [4.5353840616537555]
Graph Network Simulators (GNS) have emerged as the leading method for modeling physical phenomena.
We have constructed a high-quality physical simulation dataset encompassing 1D, 2D, and 3D scenes.
A key feature of our dataset is the inclusion of precise multi-body dynamics, facilitating a more realistic simulation of the physical world.
arXiv Detail & Related papers (2024-10-04T03:03:06Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Quantifying the LiDAR Sim-to-Real Domain Shift: A Detailed Investigation
Using Object Detectors and Analyzing Point Clouds at Target-Level [1.1999555634662635]
LiDAR object detection algorithms based on neural networks for autonomous driving require large amounts of data for training, validation, and testing.
We show that using simulated data for the training of neural networks leads to a domain shift of training and testing data due to differences in scenes, scenarios, and distributions.
arXiv Detail & Related papers (2023-03-03T12:52:01Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.