California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops
- URL: http://arxiv.org/abs/2506.10228v1
- Date: Wed, 11 Jun 2025 23:12:22 GMT
- Title: California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops
- Authors: Hamid Kamangir, Mona Hajiesmaeeli, Mason Earles,
- Abstract summary: We introduce a crop yield benchmark dataset covering over 70 crops across all California counties from 2008 to 2022.<n>The benchmark integrates diverse data sources, including Landsat satellite imagery, daily climate records, monthly evapotranspiration, and high-resolution soil properties.<n>We develop a multi-modal deep learning model tailored for county-level, crop-specific yield forecasting.<n>Our approach achieves an overall R2 score of 0.76 across all crops of unseen test dataset, highlighting strong predictive performance across California diverse agricultural regions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: California is a global leader in agricultural production, contributing 12.5% of the United States total output and ranking as the fifth-largest food and cotton supplier in the world. Despite the availability of extensive historical yield data from the USDA National Agricultural Statistics Service, accurate and timely crop yield forecasting remains a challenge due to the complex interplay of environmental, climatic, and soil-related factors. In this study, we introduce a comprehensive crop yield benchmark dataset covering over 70 crops across all California counties from 2008 to 2022. The benchmark integrates diverse data sources, including Landsat satellite imagery, daily climate records, monthly evapotranspiration, and high-resolution soil properties. To effectively learn from these heterogeneous inputs, we develop a multi-modal deep learning model tailored for county-level, crop-specific yield forecasting. The model employs stratified feature extraction and a timeseries encoder to capture spatial and temporal dynamics during the growing season. Static inputs such as soil characteristics and crop identity inform long-term variability. Our approach achieves an overall R2 score of 0.76 across all crops of unseen test dataset, highlighting strong predictive performance across California diverse agricultural regions. This benchmark and modeling framework offer a valuable foundation for advancing agricultural forecasting, climate adaptation, and precision farming. The full dataset and codebase are publicly available at our GitHub repository.
Related papers
- Machine Learning Models for Soil Parameter Prediction Based on Satellite, Weather, Clay and Yield Data [1.546169961420396]
The AgroLens project endeavors to develop Machine Learning-based methodologies to predict soil nutrient levels without reliance on laboratory tests.<n>The approach begins with the development of a robust European model using the LUCAS Soil dataset and Sentinel-2 satellite imagery.<n>Advanced algorithms, including Random Forests, Extreme Gradient Boosting (XGBoost), and Fully Connected Neural Networks (FCNN), were implemented and finetuned for precise nutrient prediction.
arXiv Detail & Related papers (2025-03-28T09:44:32Z) - A novel fusion of Sentinel-1 and Sentinel-2 with climate data for crop phenology estimation using Machine Learning [0.0]
We train a Machine Learning (ML) LightGBM model to predict 13 phenological stages for eight major crops across Germany at 20 m scale.<n>At national scale, predicted phenology resulted in a reasonable precision of R2 > 0.43 and a low Mean Absolute Error of 6 days.
arXiv Detail & Related papers (2024-08-16T13:44:35Z) - Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions.
Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z) - Cotton Yield Prediction Using Random Forest [1.8887119618534647]
Climate-smart agricultural technologies are being developed to boost yields while decreasing operating expenses.
Crop yield prediction is difficult because of the complex and nonlinear impacts of cultivar, soil type, management, pest and disease, climate, and weather patterns on crops.
We employ machine learning (ML) to forecast production while considering climate change, soil diversity, cultivar, and inorganic nitrogen levels.
arXiv Detail & Related papers (2023-12-04T19:33:29Z) - HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using
Harvest Piles and Remote Sensing [50.4506590177605]
HarvestNet is a dataset for mapping the presence of farms in the Ethiopian regions of Tigray and Amhara during 2020-2023.
We introduce a new approach based on the detection of harvest piles characteristic of many smallholder systems.
We conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food insecure regions.
arXiv Detail & Related papers (2023-08-23T11:03:28Z) - The Canadian Cropland Dataset: A New Land Cover Dataset for
Multitemporal Deep Learning Classification in Agriculture [0.8602553195689513]
temporal patch-based dataset of Canadian croplands enriched with labels retrieved from the Canadian Annual Crop Inventory.
The dataset contains 78,536 manually verified high-resolution spatial images from 10 crop classes collected over four crop production years.
As a benchmark, we provide models and source code that allow a user to predict the crop class using a single image (ResNet, DenseNet, EfficientNet) or a sequence of images (LRCN, 3D-CNN) from the same location.
arXiv Detail & Related papers (2023-05-31T18:40:15Z) - Agave crop segmentation and maturity classification with deep learning
data-centric strategies using very high-resolution satellite imagery [101.18253437732933]
We present an Agave tequilana Weber azul crop segmentation and maturity classification using very high resolution satellite imagery.
We solve real-world deep learning problems in the very specific context of agave crop segmentation.
With the resulting accurate models, agave production forecasting can be made available for large regions.
arXiv Detail & Related papers (2023-03-21T03:15:29Z) - Jalisco's multiclass land cover analysis and classification using a
novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis.
In this work, we combine three real-world open data sources to obtain 13 channels.
Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z) - High-resolution global irrigation prediction with Sentinel-2 30m data [0.8137198664755597]
An accurate and precise understanding of global irrigation usage is crucial for a variety of climate science efforts.
We have developed a novel irrigation model and Python package ("Irrigation30") to generate 30m resolution irrigation predictions of cropland worldwide.
Our model was able to achieve consistency scores in excess of 97% and an accuracy of 92% in a small geo-diverse randomly sampled test set.
arXiv Detail & Related papers (2020-12-09T17:26:43Z) - Estimating Crop Primary Productivity with Sentinel-2 and Landsat 8 using
Machine Learning Methods Trained with Radiative Transfer Simulations [58.17039841385472]
We take advantage of all parallel developments in mechanistic modeling and satellite data availability for advanced monitoring of crop productivity.
Our model successfully estimates gross primary productivity across a variety of C3 crop types and environmental conditions even though it does not use any local information from the corresponding sites.
This highlights its potential to map crop productivity from new satellite sensors at a global scale with the help of current Earth observation cloud computing platforms.
arXiv Detail & Related papers (2020-12-07T16:23:13Z) - Learning from Data to Optimize Control in Precision Farming [77.34726150561087]
Special issue presents the latest development in statistical inference, machine learning and optimum control for precision farming.
Satellite positioning and navigation followed by Internet-of-Things generate vast information that can be used to optimize farming processes in real-time.
arXiv Detail & Related papers (2020-07-07T12:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.