Coupling Machine Learning and Crop Modeling Improves Crop Yield
Prediction in the US Corn Belt
- URL: http://arxiv.org/abs/2008.04060v2
- Date: Mon, 1 Mar 2021 19:50:58 GMT
- Title: Coupling Machine Learning and Crop Modeling Improves Crop Yield
Prediction in the US Corn Belt
- Authors: Mohsen Shahhosseini, Guiping Hu, Sotirios V. Archontoulis, Isaiah
Huber
- Abstract summary: This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt.
The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction.
- Score: 2.580765958706854
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This study investigates whether coupling crop modeling and machine learning
(ML) improves corn yield predictions in the US Corn Belt. The main objectives
are to explore whether a hybrid approach (crop modeling + ML) would result in
better predictions, investigate which combinations of hybrid models provide the
most accurate predictions, and determine the features from the crop modeling
that are most effective to be integrated with ML for corn yield prediction.
Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost)
and six ensemble models have been designed to address the research question.
The results suggest that adding simulation crop model variables (APSIM) as
input features to ML models can decrease yield prediction root mean squared
error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of
APSIM features in the ML prediction models and we found soil moisture related
APSIM variables are most influential on the ML predictions followed by
crop-related and phenology-related variables. Finally, based on feature
importance measure, it has been observed that simulated APSIM average drought
stress and average water table depth during the growing season are the most
important APSIM inputs to ML. This result indicates that weather information
alone is not sufficient and ML models need more hydrological inputs to make
improved yield predictions.
Related papers
- Training Compute-Optimal Protein Language Models [48.79416103951816]
Most protein language models are trained with extensive compute resources until performance gains plateau.
Our investigation is grounded in a massive dataset consisting of 939 million protein sequences.
We trained over 300 models ranging from 3.5 million to 10.7 billion parameters on 5 to 200 billion unique tokens.
arXiv Detail & Related papers (2024-11-04T14:58:37Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z) - Ensemble models outperform single model uncertainties and predictions
for operator-learning of hypersonic flows [43.148818844265236]
Training scientific machine learning (SciML) models on limited high-fidelity data offers one approach to rapidly predict behaviors for situations that have not been seen before.
High-fidelity data is itself in limited quantity to validate all outputs of the SciML model in unexplored input space.
We extend a DeepONet using three different uncertainty mechanisms: mean-variance estimation, evidential uncertainty, and ensembling.
arXiv Detail & Related papers (2023-10-31T18:07:29Z) - Towards Machine Learning-based Fish Stock Assessment [0.0]
In this paper, we investigate the use of machine learning models to improve the estimation and forecast of relevant stock parameters.
We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees.
arXiv Detail & Related papers (2023-08-07T08:44:15Z) - A Deep Learning Model for Heterogeneous Dataset Analysis -- Application
to Winter Wheat Crop Yield Prediction [0.6595290783361958]
Time-series deep learning models, such as Long Short Term Memory (LSTM), have already been explored and applied to yield prediction.
The existing LSTM cannot handle heterogeneous datasets.
We propose an efficient deep learning model that can deal with heterogeneous datasets.
arXiv Detail & Related papers (2023-06-20T23:39:06Z) - A Comprehensive Modeling Approach for Crop Yield Forecasts using
AI-based Methods and Crop Simulation Models [0.21094707683348418]
We propose a comprehensive approach for yield forecasting that combines data-driven solutions, crop simulation models, and model surrogates.
Our data-driven modeling approach outperforms previous works with yield correlation predictions close to 91%.
arXiv Detail & Related papers (2023-06-16T18:13:24Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Forecasting Corn Yield with Machine Learning Ensembles [2.9005223064604078]
This paper provides a machine learning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa)
Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions.
Results show that ensemble models based on weighted average of the base learners outperform individual models.
arXiv Detail & Related papers (2020-01-18T03:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.