Related papers: Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis

Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis

URL: http://arxiv.org/abs/2001.00682v1
Date: Fri, 3 Jan 2020 01:45:36 GMT
Title: Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis
Authors: Roozbeh Yousefzadeh and Dianne P. O'Leary
Abstract summary: We use flip points to explain, audit, and debug deep learning models. A flip point is any point that lies on the boundary between two output classes. We demonstrate our methods by investigating several models trained on standard datasets used in social applications of machine learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning models have been criticized for their lack of easy interpretation, which undermines confidence in their use for important applications. Nevertheless, they are consistently utilized in many applications, consequential to humans' lives, mostly because of their better performance. Therefore, there is a great need for computational methods that can explain, audit, and debug such models. Here, we use flip points to accomplish these goals for deep learning models with continuous output scores (e.g., computed by softmax), used in social applications. A flip point is any point that lies on the boundary between two output classes: e.g. for a model with a binary yes/no output, a flip point is any input that generates equal scores for "yes" and "no". The flip point closest to a given input is of particular importance because it reveals the least changes in the input that would change a model's classification, and we show that it is the solution to a well-posed optimization problem. Flip points also enable us to systematically study the decision boundaries of a deep learning classifier. The resulting insight into the decision boundaries of a deep model can clearly explain the model's output on the individual-level, via an explanation report that is understandable by non-experts. We also develop a procedure to understand and audit model behavior towards groups of people. Flip points can also be used to alter the decision boundaries in order to improve undesirable behaviors. We demonstrate our methods by investigating several models trained on standard datasets used in social applications of machine learning. We also identify the features that are most responsible for particular classifications and misclassifications.

Related papers

Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework. SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions. We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z)
Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning [5.877778007271621]
We introduce a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble. Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs. We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data.
arXiv Detail & Related papers (2024-05-30T21:21:33Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Deep Explainable Learning with Graph Based Data Assessing and Rule Reasoning [4.369058206183195]
We propose an end-to-end deep explainable learning approach that combines the advantage of deep model in noise handling and expert rule-based interpretability. The proposed method is tested in an industry production system, showing comparable prediction accuracy, much higher generalization stability and better interpretability.
arXiv Detail & Related papers (2022-11-09T05:58:56Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
An exact counterfactual-example-based approach to tree-ensemble models interpretability [0.0]
High-performance models do not exhibit the necessary transparency to make their decisions fully understandable. We could derive an exact geometrical characterisation of their decision regions under the form of a collection of multidimensional intervals. An adaptation to reasoning on regression problems is also envisaged.
arXiv Detail & Related papers (2021-05-31T09:32:46Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability. Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers. This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.