Related papers: Information-theoretic Evolution of Model Agnostic Global Explanations

Information-theoretic Evolution of Model Agnostic Global Explanations

URL: http://arxiv.org/abs/2105.06956v1
Date: Fri, 14 May 2021 16:52:16 GMT
Title: Information-theoretic Evolution of Model Agnostic Global Explanations
Authors: Sukriti Verma, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy
Abstract summary: We present a novel model-agnostic approach that derives rules to globally explain the behavior of classification models trained on numerical and/or categorical data. Our approach has been deployed in a leading digital marketing suite of products.
Score: 10.921146104622972
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally explain the behavior of classification models trained on numerical and/or categorical data. Our approach builds on top of existing local model explanation methods to extract conditions important for explaining model behavior for specific instances followed by an evolutionary algorithm that optimizes an information theory based fitness function to construct rules that explain global model behavior. We show how our approach outperforms existing approaches on a variety of datasets. Further, we introduce a parameter to evaluate the quality of interpretation under the scenario of distributional shift. This parameter evaluates how well the interpretation can predict model behavior for previously unseen data distributions. We show how existing approaches for interpreting models globally lack distributional robustness. Finally, we show how the quality of the interpretation can be improved under the scenario of distributional shift by adding out of distribution samples to the dataset used to learn the interpretation and thereby, increase robustness. All of the datasets used in our paper are open and publicly available. Our approach has been deployed in a leading digital marketing suite of products.

Related papers

Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients [9.961090778082285]
Deep learning models achieve high predictive performance but lack intrinsic interpretability. We introduce a novel framework for local interventional explanations by leveraging recent advances in image-to-image editing models. Our approach performs gradual interventions on semantic properties to quantify the corresponding impact on a model's predictions.
arXiv Detail & Related papers (2025-03-07T13:50:37Z)
The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining [27.144616560712493]
We investigate whether better sample efficiency and the better generalization capability of models pretrained with masked language modeling can be attributed to the semantic similarity encoded in the pretraining data's distributional property. Our results illustrate our limited understanding of model pretraining and provide future research directions.
arXiv Detail & Related papers (2023-10-25T00:31:29Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Globally Interpretable Graph Learning via Distribution Matching [12.885580925389352]
We aim to answer an important question that is not yet well studied: how to provide a global interpretation for the graph learning procedure? We formulate this problem as globally interpretable graph learning, which targets on distilling high-level and human-intelligible patterns that dominate the learning procedure. We propose a novel model fidelity metric, tailored for evaluating the fidelity of the resulting model trained on interpretations.
arXiv Detail & Related papers (2023-06-18T00:50:36Z)
Are Data-driven Explanations Robust against Out-of-distribution Data? [18.760475318852375]
We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE) Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation. Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
arXiv Detail & Related papers (2023-03-29T02:02:08Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models. We show that partial orders of local/global feature importance arise from this methodology. We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
A Topological-Framework to Improve Analysis of Machine Learning Model Performance [5.3893373617126565]
We propose a framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. We describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.
arXiv Detail & Related papers (2021-07-09T23:11:13Z)
An Information-theoretic Approach to Distribution Shifts [9.475039534437332]
Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere. neural networks that are fit to a subset of the population might carry some selection bias into their decision process.
arXiv Detail & Related papers (2021-06-07T16:44:21Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.