Hybrid machine learning data assimilation for marine biogeochemistry
- URL: http://arxiv.org/abs/2504.05218v1
- Date: Mon, 07 Apr 2025 16:04:10 GMT
- Title: Hybrid machine learning data assimilation for marine biogeochemistry
- Authors: Ieuan Higgs, Ross Bannister, Jozef Skákala, Alberto Carrassi, Stefano Ciavatta,
- Abstract summary: Marine biogeochemistry models are critical for forecasting, as well as estimating ecosystem responses to climate change and human activities.<n>Existing DA methods struggle to update unobserved variables effectively, while ensemble-based methods are computationally too expensive for high-complexity models.<n>This study demonstrates how machine learning can improve marine biogeochemistry DA by learning statistical relationships between observed and unobserved variables.
- Score: 0.2383122657918106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Marine biogeochemistry models are critical for forecasting, as well as estimating ecosystem responses to climate change and human activities. Data assimilation (DA) improves these models by aligning them with real-world observations, but marine biogeochemistry DA faces challenges due to model complexity, strong nonlinearity, and sparse, uncertain observations. Existing DA methods applied to marine biogeochemistry struggle to update unobserved variables effectively, while ensemble-based methods are computationally too expensive for high-complexity marine biogeochemistry models. This study demonstrates how machine learning (ML) can improve marine biogeochemistry DA by learning statistical relationships between observed and unobserved variables. We integrate ML-driven balancing schemes into a 1D prototype of a system used to forecast marine biogeochemistry in the North-West European Shelf seas. ML is applied to predict (i) state-dependent correlations from free-run ensembles and (ii), in an ``end-to-end'' fashion, analysis increments from an Ensemble Kalman Filter. Our results show that ML significantly enhances updates for previously not-updated variables when compared to univariate schemes akin to those used operationally. Furthermore, ML models exhibit moderate transferability to new locations, a crucial step toward scaling these methods to 3D operational systems. We conclude that ML offers a clear pathway to overcome current computational bottlenecks in marine biogeochemistry DA and that refining transferability, optimizing training data sampling, and evaluating scalability for large-scale marine forecasting, should be future research priorities.
Related papers
- Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years.
This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands.
It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z) - AI enhanced data assimilation and uncertainty quantification applied to
Geological Carbon Storage [0.0]
We introduce the Surrogate-based hybrid ESMDA (SH-ESMDA), an adaptation of the traditional Ensemble Smoother with Multiple Data Assimilation (ESMDA)
We also introduce Surrogate-based Hybrid RML (SH-RML), a variational data assimilation approach that relies on the randomized maximum likelihood (RML)
Our comparative analyses show that SH-RML offers better uncertainty compared to conventional ESMDA for the case study.
arXiv Detail & Related papers (2024-02-09T00:24:46Z) - SSL-SoilNet: A Hybrid Transformer-based Framework with Self-Supervised Learning for Large-scale Soil Organic Carbon Prediction [2.554658234030785]
This study introduces a novel approach that aims to learn the geographical link between multimodal features via self-supervised contrastive learning.
The proposed approach has undergone rigorous testing on two distinct large-scale datasets.
arXiv Detail & Related papers (2023-08-07T13:44:44Z) - Towards Machine Learning-based Fish Stock Assessment [0.0]
In this paper, we investigate the use of machine learning models to improve the estimation and forecast of relevant stock parameters.
We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees.
arXiv Detail & Related papers (2023-08-07T08:44:15Z) - Using evolutionary machine learning to characterize and optimize
co-pyrolysis of biomass feedstocks and polymeric wastes [14.894507238371768]
Co-pyrolysis is a promising strategy for improving the quantity and quality parameters of the resulting liquid fuel.
Machine learning (ML) provides capabilities to cope with such issues by leveraging on existing data.
This work aims to introduce an evolutionary ML approach to quantify the (by)products of the biomass-polymer co-pyrolysis process.
arXiv Detail & Related papers (2023-05-24T19:59:21Z) - Dynamic Basis Function Interpolation for Adaptive In Situ Data Integration in Ocean Modeling [1.4549461207028445]
We propose a new method for combining in situ buoy measurements with Earth system models (ESMs) to improve the accuracy of temperature predictions in the ocean.
The technique utilizes the dynamics textitand modes identified in ESMs alongside buoy measurements to improve accuracy while preserving features such as seasonality.
arXiv Detail & Related papers (2023-01-11T21:21:02Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - Using Explainable Boosting Machine to Compare Idiographic and Nomothetic
Approaches for Ecological Momentary Assessment Data [2.0824228840987447]
This paper explores the use of non-linear interpretable machine learning (ML) models in classification problems.
Various ensembles of trees are compared to linear models using imbalanced synthetic and real-world datasets.
In one of the two real-world datasets, knowledge distillation method achieves improved AUC scores.
arXiv Detail & Related papers (2022-04-04T17:56:37Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Prediction of liquid fuel properties using machine learning models with
Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels.
Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach.
The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.