Related papers: Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt

Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt

URL: http://arxiv.org/abs/2008.04060v2
Date: Mon, 1 Mar 2021 19:50:58 GMT
Title: Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt
Authors: Mohsen Shahhosseini, Guiping Hu, Sotirios V. Archontoulis, Isaiah Huber
Abstract summary: This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction.
Score: 2.580765958706854
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.

Related papers

Hybrid machine learning data assimilation for marine biogeochemistry [0.2383122657918106]
Marine biogeochemistry models are critical for forecasting, as well as estimating ecosystem responses to climate change and human activities. Existing DA methods struggle to update unobserved variables effectively, while ensemble-based methods are computationally too expensive for high-complexity models. This study demonstrates how machine learning can improve marine biogeochemistry DA by learning statistical relationships between observed and unobserved variables.
arXiv Detail & Related papers (2025-04-07T16:04:10Z)
Knowledge-guided machine learning for county-level corn yield prediction under drought [7.75600387348283]
Remote sensing (RS) technique, enabling the non-contact acquisition of extensive ground observations, is a valuable tool for crop yield predictions.<n>Traditional process-based models struggle to incorporate large volumes of RS data.<n>Machine learning (ML) models are often criticized as "black boxes" due to their limited interpretability.
arXiv Detail & Related papers (2025-03-20T16:52:25Z)
Loss Landscape Analysis for Reliable Quantized ML Models for Scientific Sensing [41.89148096989836]
We propose a method to perform empirical analysis of the loss landscape of machine learning (ML) models. Our method allows assessing the robustness of ML models to such effects as a function of quantization precision and under different regularization techniques.
arXiv Detail & Related papers (2025-02-12T12:30:49Z)
Training Compute-Optimal Protein Language Models [48.79416103951816]
Most protein language models are trained with extensive compute resources until performance gains plateau. Our investigation is grounded in a massive dataset consisting of 939 million protein sequences. We trained over 300 models ranging from 3.5 million to 10.7 billion parameters on 5 to 200 billion unique tokens.
arXiv Detail & Related papers (2024-11-04T14:58:37Z)
Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels [38.84835238599221]
Flow matching posterior estimation (FMPE) is a new machine learning approach to atmospheric retrieval. FMPE trains about 3 times faster than neural posterior estimation (NPE) and yields higher IS efficiencies. IS successfully corrects inaccurate ML results, identifies model failures via low efficiencies, and provides accurate estimates of the Bayesian evidence.
arXiv Detail & Related papers (2024-10-28T19:28:07Z)
Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning. By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets. The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z)
Ensemble models outperform single model uncertainties and predictions for operator-learning of hypersonic flows [43.148818844265236]
Training scientific machine learning (SciML) models on limited high-fidelity data offers one approach to rapidly predict behaviors for situations that have not been seen before. High-fidelity data is itself in limited quantity to validate all outputs of the SciML model in unexplored input space. We extend a DeepONet using three different uncertainty mechanisms: mean-variance estimation, evidential uncertainty, and ensembling.
arXiv Detail & Related papers (2023-10-31T18:07:29Z)
Towards Machine Learning-based Fish Stock Assessment [0.0]
In this paper, we investigate the use of machine learning models to improve the estimation and forecast of relevant stock parameters. We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees.
arXiv Detail & Related papers (2023-08-07T08:44:15Z)
A Deep Learning Model for Heterogeneous Dataset Analysis -- Application to Winter Wheat Crop Yield Prediction [0.6595290783361958]
Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction. The existing LSTM cannot handle heterogeneous datasets. We propose an efficient deep learning model that can deal with heterogeneous datasets.
arXiv Detail & Related papers (2023-06-20T23:39:06Z)
A Comprehensive Modeling Approach for Crop Yield Forecasts using AI-based Methods and Crop Simulation Models [0.21094707683348418]
We propose a comprehensive approach for yield forecasting that combines data-driven solutions, crop simulation models, and model surrogates. Our data-driven modeling approach outperforms previous works with yield correlation predictions close to 91%.
arXiv Detail & Related papers (2023-06-16T18:13:24Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs) We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z)
A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z)
Forecasting Corn Yield with Machine Learning Ensembles [2.9005223064604078]
This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. Results show that ensemble models based on weighted average of the base learners outperform individual models.
arXiv Detail & Related papers (2020-01-18T03:55:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.