Leveraging Model-based Trees as Interpretable Surrogate Models for Model
Distillation
- URL: http://arxiv.org/abs/2310.03112v1
- Date: Wed, 4 Oct 2023 19:06:52 GMT
- Title: Leveraging Model-based Trees as Interpretable Surrogate Models for Model
Distillation
- Authors: Julia Herbinger, Susanne Dandl, Fiona K. Ewald, Sofia Loibl, Giuseppe
Casalicchio
- Abstract summary: Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models.
This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules.
Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models.
- Score: 3.5437916561263694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Surrogate models play a crucial role in retrospectively interpreting complex
and powerful black box machine learning models via model distillation. This
paper focuses on using model-based trees as surrogate models which partition
the feature space into interpretable regions via decision rules. Within each
region, interpretable models based on additive main effects are used to
approximate the behavior of the black box model, striking for an optimal
balance between interpretability and performance. Four model-based tree
algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their
ability to generate such surrogate models. We investigate fidelity,
interpretability, stability, and the algorithms' capability to capture
interaction effects through appropriate splits. Based on our comprehensive
analyses, we finally provide an overview of user-specific recommendations.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - Characterizing Disparity Between Edge Models and High-Accuracy Base Models for Vision Tasks [5.081175754775484]
We introduce XDELTA, a novel explainable AI tool that explains differences between a high-accuracy base model and a computationally efficient but lower-accuracy edge model.
We conduct a comprehensive evaluation to test XDELTA's ability to explain model discrepancies, using over 1.2 million images and 24 models, and assessing real-world deployments with six participants.
arXiv Detail & Related papers (2024-07-13T22:05:58Z) - SynthTree: Co-supervised Local Model Synthesis for Explainable Prediction [15.832975722301011]
We propose a novel method to enhance explainability with minimal accuracy loss.
We have developed novel methods for estimating nodes by leveraging AI techniques.
Our findings highlight the critical role that statistical methodologies can play in advancing explainable AI.
arXiv Detail & Related papers (2024-06-16T14:43:01Z) - The Bayesian Context Trees State Space Model for time series modelling
and forecasting [8.37609145576126]
A hierarchical Bayesian framework is introduced for developing rich mixture models for real-valued time series.
At the top level, meaningful discrete states are identified as appropriately quantised values of some of the most recent samples.
At the bottom level, a different, arbitrary model for real-valued time series - a base model - is associated with each state.
arXiv Detail & Related papers (2023-08-02T02:40:42Z) - GAM(e) changer or not? An evaluation of interpretable machine learning
models based on additive model constraints [5.783415024516947]
This paper investigates a series of intrinsically interpretable machine learning models.
We evaluate the prediction qualities of five GAMs as compared to six traditional ML models.
arXiv Detail & Related papers (2022-04-19T20:37:31Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Robustness of Model Predictions under Extension [3.766702945560518]
A caveat to using models for analysis is that predicted causal effects and conditional independences may not be robust under model extensions.
We show how to use the technique of causal ordering to efficiently assess the robustness of qualitative model predictions.
For dynamical systems at equilibrium, we demonstrate how novel insights help to select appropriate model extensions.
arXiv Detail & Related papers (2020-12-08T20:21:03Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.