Machine Learning Workflows in Climate Modeling: Design Patterns and Insights from Case Studies
- URL: http://arxiv.org/abs/2510.03305v1
- Date: Tue, 30 Sep 2025 18:26:18 GMT
- Title: Machine Learning Workflows in Climate Modeling: Design Patterns and Insights from Case Studies
- Authors: Tian Zheng, Subashree Venkatasubramanian, Shuolin Li, Amy Braverman, Xinyi Ke, Zhewen Hou, Peter Jin, Samarth Sanjay Agrawal,
- Abstract summary: This paper analyzes a series of case studies from applied machine learning research in climate modeling.<n>Rather than reviewing technical details, we aim to synthesize design patterns across diverse projects in ML-enabled climate modeling.
- Score: 4.2291102971578844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning has been increasingly applied in climate modeling on system emulation acceleration, data-driven parameter inference, forecasting, and knowledge discovery, addressing challenges such as physical consistency, multi-scale coupling, data sparsity, robust generalization, and integration with scientific workflows. This paper analyzes a series of case studies from applied machine learning research in climate modeling, with a focus on design choices and workflow structure. Rather than reviewing technical details, we aim to synthesize workflow design patterns across diverse projects in ML-enabled climate modeling: from surrogate modeling, ML parameterization, probabilistic programming, to simulation-based inference, and physics-informed transfer learning. We unpack how these workflows are grounded in physical knowledge, informed by simulation data, and designed to integrate observations. We aim to offer a framework for ensuring rigor in scientific machine learning through more transparent model development, critical evaluation, informed adaptation, and reproducibility, and to contribute to lowering the barrier for interdisciplinary collaboration at the interface of data science and climate modeling.
Related papers
- A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [221.34650992288505]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries [23.111935712144277]
Rapid and accurate simulations of fluid dynamics around complicated geometric bodies are critical in a variety of engineering and scientific applications.<n>While scientific machine learning (SciML) has shown considerable promise, most studies in this field are limited to simple geometries.<n>This paper addresses this gap by benchmarking diverse SciML models for fluid flow prediction over intricate geometries.
arXiv Detail & Related papers (2024-12-31T00:23:15Z) - Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey [51.87875066383221]
This paper introduces fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles Machine Learning plays in improving CFD.
We highlight real-world applications of ML for CFD in critical scientific and engineering disciplines, including aerodynamics, combustion, atmosphere & ocean science, biology fluid, plasma, symbolic regression, and reduced order modeling.
We draw the conclusion that ML is poised to significantly transform CFD research by enhancing simulation accuracy, reducing computational time, and enabling more complex analyses of fluid dynamics.
arXiv Detail & Related papers (2024-08-22T07:33:11Z) - Evaluating the transferability potential of deep learning models for climate downscaling [16.30722178785489]
We evaluate the efficacy of training deep learning downscaling models on multiple climate datasets to learn more robust and transferable representations.
We assess the spatial, variable, and product transferability of downscaling models experimentally, to understand the generalizability of these different architecture types.
arXiv Detail & Related papers (2024-07-17T12:10:24Z) - Code Generation for Machine Learning using Model-Driven Engineering and
SysML [0.0]
This work aims to facilitate the implementation of data-driven engineering in practice by extending the previous work of formalizing machine learning tasks.
The presented method is evaluated for feasibility in a case study to predict weather forecasts.
Results demonstrate the flexibility and the simplicity of the method reducing efforts for implementation.
arXiv Detail & Related papers (2023-07-10T15:00:20Z) - Addressing computational challenges in physical system simulations with
machine learning [0.0]
We present a machine learning-based data generator framework tailored to aid researchers who utilize simulations to examine various physical systems or processes.
Our approach involves a two-step process: first, we train a supervised predictive model using a limited simulated dataset to predict simulation outcomes.
Subsequently, a reinforcement learning agent is trained to generate accurate, simulation-like data by leveraging the supervised model.
arXiv Detail & Related papers (2023-05-16T17:31:50Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Simulation Intelligence: Towards a New Generation of Scientific Methods [81.75565391122751]
"Nine Motifs of Simulation Intelligence" is a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence.
We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system.
We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery.
arXiv Detail & Related papers (2021-12-06T18:45:31Z) - Knowledge-Guided Dynamic Systems Modeling: A Case Study on Modeling
River Water Quality [8.110949636804774]
Modeling real-world phenomena is a focus of many science and engineering efforts, such as ecological modeling and financial forecasting.
Building an accurate model for complex and dynamic systems improves understanding of underlying processes and leads to resource efficiency.
At the opposite extreme, data-driven modeling learns a model directly from data, requiring extensive data and potentially generating overfitting.
We focus on an intermediate approach, model revision, in which prior knowledge and data are combined to achieve the best of both worlds.
arXiv Detail & Related papers (2021-03-01T06:31:38Z) - Physics-Integrated Variational Autoencoders for Robust and Interpretable
Generative Modeling [86.9726984929758]
We focus on the integration of incomplete physics models into deep generative models.
We propose a VAE architecture in which a part of the latent space is grounded by physics.
We demonstrate generative performance improvements over a set of synthetic and real-world datasets.
arXiv Detail & Related papers (2021-02-25T20:28:52Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.