Related papers: Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction

Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction

URL: http://arxiv.org/abs/2304.07111v1
Date: Fri, 14 Apr 2023 13:03:33 GMT
Title: Grouping Shapley Value Feature Importances of Random Forests for explainable Yield Prediction
Authors: Florian Huber, Hannes Engler, Anna Kicherer, Katja Herzog, Reinhard T\"opfer, Volker Steinhage
Abstract summary: We explain the concept of Shapley values directly computed for groups of features and introduce an algorithm to compute them efficiently on tree structures. We provide a blueprint for designing swarm plots that combine many local explanations for global understanding.
Score: 0.8543936047647136
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Explainability in yield prediction helps us fully explore the potential of machine learning models that are already able to achieve high accuracy for a variety of yield prediction scenarios. The data included for the prediction of yields are intricate and the models are often difficult to understand. However, understanding the models can be simplified by using natural groupings of the input features. Grouping can be achieved, for example, by the time the features are captured or by the sensor used to do so. The state-of-the-art for interpreting machine learning models is currently defined by the game-theoretic approach of Shapley values. To handle groups of features, the calculated Shapley values are typically added together, ignoring the theoretical limitations of this approach. We explain the concept of Shapley values directly computed for predefined groups of features and introduce an algorithm to compute them efficiently on tree structures. We provide a blueprint for designing swarm plots that combine many local explanations for global understanding. Extensive evaluation of two different yield prediction problems shows the worth of our approach and demonstrates how we can enable a better understanding of yield prediction models in the future, ultimately leading to mutual enrichment of research and application.

Related papers

Variational Shapley Network: A Probabilistic Approach to Self-Explaining Shapley values with Uncertainty Quantification [2.6699011287124366]
Shapley values have emerged as a foundational tool in machine learning (ML) for elucidating model decision-making processes. We introduce a novel, self-explaining method that simplifies the computation of Shapley values significantly, requiring only a single forward pass.
arXiv Detail & Related papers (2024-02-06T18:09:05Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
TsSHAP: Robust model agnostic feature-based explainability for time series forecasting [6.004928390125367]
We propose a feature-based explainability algorithm, TsSHAP, that can explain the forecast of any black-box forecasting model. We formalize the notion of local, semi-local, and global explanations in the context of time series forecasting.
arXiv Detail & Related papers (2023-03-22T05:14:36Z)
Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM. We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z)
Exact Shapley Values for Local and Model-True Explanations of Decision Tree Ensembles [0.0]
We consider the application of Shapley values for explaining decision tree ensembles. We present a novel approach to Shapley value-based feature attribution that can be applied to random forests and boosted decision trees.
arXiv Detail & Related papers (2021-12-16T20:16:02Z)
Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
An Interpretable Probabilistic Model for Short-Term Solar Power Forecasting Using Natural Gradient Boosting [0.0]
We propose a two stage probabilistic forecasting framework able to generate highly accurate, reliable, and sharp forecasts. The framework offers full transparency on both the point forecasts and the prediction intervals (PIs) To highlight the performance and the applicability of the proposed framework, real data from two PV parks located in Southern Germany are employed.
arXiv Detail & Related papers (2021-08-05T12:59:38Z)
Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand. We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value. Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.