Coupling Machine Learning and Crop Modeling Improves Crop Yield
Prediction in the US Corn Belt
- URL: http://arxiv.org/abs/2008.04060v2
- Date: Mon, 1 Mar 2021 19:50:58 GMT
- Title: Coupling Machine Learning and Crop Modeling Improves Crop Yield
Prediction in the US Corn Belt
- Authors: Mohsen Shahhosseini, Guiping Hu, Sotirios V. Archontoulis, Isaiah
Huber
- Abstract summary: This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt.
The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction.
- Score: 2.580765958706854
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This study investigates whether coupling crop modeling and machine learning
(ML) improves corn yield predictions in the US Corn Belt. The main objectives
are to explore whether a hybrid approach (crop modeling + ML) would result in
better predictions, investigate which combinations of hybrid models provide the
most accurate predictions, and determine the features from the crop modeling
that are most effective to be integrated with ML for corn yield prediction.
Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost)
and six ensemble models have been designed to address the research question.
The results suggest that adding simulation crop model variables (APSIM) as
input features to ML models can decrease yield prediction root mean squared
error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of
APSIM features in the ML prediction models and we found soil moisture related
APSIM variables are most influential on the ML predictions followed by
crop-related and phenology-related variables. Finally, based on feature
importance measure, it has been observed that simulated APSIM average drought
stress and average water table depth during the growing season are the most
important APSIM inputs to ML. This result indicates that weather information
alone is not sufficient and ML models need more hydrological inputs to make
improved yield predictions.
Related papers
- Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z) - A Multi-Grained Symmetric Differential Equation Model for Learning
Protein-Ligand Binding Dynamics [74.93549765488103]
In drug discovery, molecular dynamics simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding.
We show the efficiency and effectiveness of NeuralMD, with a 2000$times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - Towards Machine Learning-based Fish Stock Assessment [0.0]
In this paper, we investigate the use of machine learning models to improve the estimation and forecast of relevant stock parameters.
We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees.
arXiv Detail & Related papers (2023-08-07T08:44:15Z) - A Deep Learning Model for Heterogeneous Dataset Analysis -- Application
to Winter Wheat Crop Yield Prediction [0.6595290783361958]
Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction.
The existing LSTM cannot handle heterogeneous datasets.
We propose an efficient deep learning model that can deal with heterogeneous datasets.
arXiv Detail & Related papers (2023-06-20T23:39:06Z) - A Comprehensive Modeling Approach for Crop Yield Forecasts using
AI-based Methods and Crop Simulation Models [0.21094707683348418]
We propose a comprehensive approach for yield forecasting that combines data-driven solutions, crop simulation models, and model surrogates.
Our data-driven modeling approach outperforms previous works with yield correlation predictions close to 91%.
arXiv Detail & Related papers (2023-06-16T18:13:24Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Forecasting Corn Yield with Machine Learning Ensembles [2.9005223064604078]
This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa)
Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions.
Results show that ensemble models based on weighted average of the base learners outperform individual models.
arXiv Detail & Related papers (2020-01-18T03:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.