Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction
- URL: http://arxiv.org/abs/2508.06939v1
- Date: Sat, 09 Aug 2025 11:09:10 GMT
- Title: Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction
- Authors: Hiba Najjar, Deepak Pathak, Marlon Nuske, Andreas Dengel,
- Abstract summary: We leverage the intrinsic explainability of Transformer-based models to explain multimodal learning networks.<n>This study focuses on the task of crop yield prediction at the subfield level.
- Score: 36.766406330345525
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal learning enables various machine learning tasks to benefit from diverse data sources, effectively mimicking the interplay of different factors in real-world applications, particularly in agriculture. While the heterogeneous nature of involved data modalities may necessitate the design of complex architectures, the model interpretability is often overlooked. In this study, we leverage the intrinsic explainability of Transformer-based models to explain multimodal learning networks, focusing on the task of crop yield prediction at the subfield level. The large datasets used cover various crops, regions, and years, and include four different input modalities: multispectral satellite and weather time series, terrain elevation maps and soil properties. Based on the self-attention mechanism, we estimate feature attributions using two methods, namely the Attention Rollout (AR) and Generic Attention (GA), and evaluate their performance against Shapley-based model-agnostic estimations, Shapley Value Sampling (SVS). Additionally, we propose the Weighted Modality Activation (WMA) method to assess modality attributions and compare it with SVS attributions. Our findings indicate that Transformer-based models outperform other architectures, specifically convolutional and recurrent networks, achieving R2 scores that are higher by 0.10 and 0.04 at the subfield and field levels, respectively. AR is shown to provide more robust and reliable temporal attributions, as confirmed through qualitative and quantitative evaluation, compared to GA and SVS values. Information about crop phenology stages was leveraged to interpret the explanation results in the light of established agronomic knowledge. Furthermore, modality attributions revealed varying patterns across the two methods compared.[...]
Related papers
- Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics [0.36646002427839136]
We investigate the use of Shapley Additive Explanations (SHAP) on a multi-view deep learning model applied to multi-omics data.<n> Rankings of features via SHAP are compared across various architectures to evaluate consistency of the method.<n>We present an alternative, simple method to assess the robustness of identification of important biomolecules.
arXiv Detail & Related papers (2025-07-30T17:53:42Z) - A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw [2.906027992527643]
Retro Thaw Slumps (RTS) in Arctic regions are distinct permafrost landforms with significant environmental impacts.<n>This paper employed a state-of-the-art deep learning model, the Mask R-CNN, to delineate RTS features across the Arctic.<n>Two new strategies were introduced to optimize multimodal learning and enhance the model's predictive performance.
arXiv Detail & Related papers (2025-04-23T22:18:10Z) - Exploring the Efficacy of Meta-Learning: Unveiling Superior Data Diversity Utilization of MAML Over Pre-training [1.3980986259786223]
We show that dataset diversity can impact the performance of vision models.<n>Our study shows positive correlations between test set accuracy and data diversity.<n>These findings support our hypothesis and demonstrate a promising way for a deeper exploration of how formal data diversity influences model performance.
arXiv Detail & Related papers (2025-01-15T00:56:59Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Explainable AI in Grassland Monitoring: Enhancing Model Performance and
Domain Adaptability [0.6131022957085438]
Grasslands are known for their high biodiversity and ability to provide multiple ecosystem services.
Challenges in automating the identification of indicator plants are key obstacles to large-scale grassland monitoring.
This paper delves into the latter two challenges, with a specific focus on transfer learning and XAI approaches to grassland monitoring.
arXiv Detail & Related papers (2023-12-13T10:17:48Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - IMACS: Image Model Attribution Comparison Summaries [16.80986701058596]
We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques.
IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features.
We show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.
arXiv Detail & Related papers (2022-01-26T21:35:14Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.
This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management.
We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.