Related papers: Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany

Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany

URL: http://arxiv.org/abs/2512.15140v1
Date: Wed, 17 Dec 2025 07:01:47 GMT
Title: Generalization and Feature Attribution in Machine Learning Models for Crop Yield and Anomaly Prediction in Germany
Authors: Roland Baatz,
Abstract summary: This study examines the generalization performance and interpretability of machine learning (ML) models used for predicting crop yield and yield anomalies in Germany's NUTS-3 regions.<n>Using a high-quality, long-term dataset, the study systematically compares the evaluation and temporal validation behavior of ensemble tree-based models and deep learning approaches.<n>Models with strong test-set accuracy, but weak temporal validation performance can still produce seemingly credible SHAP feature importance values.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study examines the generalization performance and interpretability of machine learning (ML) models used for predicting crop yield and yield anomalies in Germany's NUTS-3 regions. Using a high-quality, long-term dataset, the study systematically compares the evaluation and temporal validation behavior of ensemble tree-based models (XGBoost, Random Forest) and deep learning approaches (LSTM, TCN). While all models perform well on spatially split, conventional test sets, their performance degrades substantially on temporally independent validation years, revealing persistent limitations in generalization. Notably, models with strong test-set accuracy, but weak temporal validation performance can still produce seemingly credible SHAP feature importance values. This exposes a critical vulnerability in post hoc explainability methods: interpretability may appear reliable even when the underlying model fails to generalize. These findings underscore the need for validation-aware interpretation of ML predictions in agricultural and environmental systems. Feature importance should not be accepted at face value unless models are explicitly shown to generalize to unseen temporal and spatial conditions. The study advocates for domain-aware validation, hybrid modeling strategies, and more rigorous scrutiny of explainability methods in data-driven agriculture. Ultimately, this work addresses a growing challenge in environmental data science: how can we evaluate generalization robustly enough to trust model explanations?

Related papers

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z)
A Comparative Analysis of Interpretable Machine Learning Methods [0.13854111346209866]
In recent years, Machine Learning has seen widespread adoption across a broad range of sectors, including high-stakes domains such as healthcare, finance, and law.<n>Growing reliance has raised increasing concerns regarding model interpretability and accountability.
arXiv Detail & Related papers (2026-01-01T18:39:05Z)
Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes [55.2480439325792]
Reinforcement learning (RL) has proven remarkably effective at improving the accuracy of language models in verifiable and deterministic domains like mathematics.<n>Here, we examine if current RL methods are also effective at optimizing language models in verifiable domains with outcomes, like scientific experiments.
arXiv Detail & Related papers (2025-08-15T20:50:53Z)
Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
Prediction Models That Learn to Avoid Missing Values [7.302408149992981]
Missingness-avoiding (MA) machine learning is a framework for training models to rarely require the values of missing features at test time.<n>We create tailored MA learning algorithms for decision trees, tree ensembles, and sparse linear models.<n>We show that our framework gives practitioners a powerful tool to maintain interpretability in predictions with test-time missing values.
arXiv Detail & Related papers (2025-05-06T10:16:35Z)
A Temporally Disentangled Contrastive Diffusion Model for Spatiotemporal Imputation [35.46631415365955]
We introduce a conditional diffusion framework called C$2$TSD, which incorporates disentangled temporal (trend and seasonality) representations as conditional information. Our experiments on three real-world datasets demonstrate the superior performance of our approach compared to a number of state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-18T11:59:04Z)
A roadmap to fair and trustworthy prediction model validation in healthcare [2.476158303361112]
A prediction model is most useful if it generalizes beyond the development data. We propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
arXiv Detail & Related papers (2023-04-07T04:24:19Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE) We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z)
The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting [26.351782287953267]
We develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade. We present empirical results with three real-world data sets. We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations.
arXiv Detail & Related papers (2022-02-08T22:02:31Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
A comprehensive study on the prediction reliability of graph neural networks for virtual screening [0.0]
We investigate the effects of model architectures, regularization methods, and loss functions on the prediction performance and reliability of classification results. Our result highlights that correct choice of regularization and inference methods is evidently important to achieve high success rate.
arXiv Detail & Related papers (2020-03-17T10:13:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.