Auditing and Debugging Deep Learning Models via Decision Boundaries:
Individual-level and Group-level Analysis
- URL: http://arxiv.org/abs/2001.00682v1
- Date: Fri, 3 Jan 2020 01:45:36 GMT
- Title: Auditing and Debugging Deep Learning Models via Decision Boundaries:
Individual-level and Group-level Analysis
- Authors: Roozbeh Yousefzadeh and Dianne P. O'Leary
- Abstract summary: We use flip points to explain, audit, and debug deep learning models.
A flip point is any point that lies on the boundary between two output classes.
We demonstrate our methods by investigating several models trained on standard datasets used in social applications of machine learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have been criticized for their lack of easy
interpretation, which undermines confidence in their use for important
applications. Nevertheless, they are consistently utilized in many
applications, consequential to humans' lives, mostly because of their better
performance. Therefore, there is a great need for computational methods that
can explain, audit, and debug such models. Here, we use flip points to
accomplish these goals for deep learning models with continuous output scores
(e.g., computed by softmax), used in social applications. A flip point is any
point that lies on the boundary between two output classes: e.g. for a model
with a binary yes/no output, a flip point is any input that generates equal
scores for "yes" and "no". The flip point closest to a given input is of
particular importance because it reveals the least changes in the input that
would change a model's classification, and we show that it is the solution to a
well-posed optimization problem. Flip points also enable us to systematically
study the decision boundaries of a deep learning classifier. The resulting
insight into the decision boundaries of a deep model can clearly explain the
model's output on the individual-level, via an explanation report that is
understandable by non-experts. We also develop a procedure to understand and
audit model behavior towards groups of people. Flip points can also be used to
alter the decision boundaries in order to improve undesirable behaviors. We
demonstrate our methods by investigating several models trained on standard
datasets used in social applications of machine learning. We also identify the
features that are most responsible for particular classifications and
misclassifications.
Related papers
- Policy Trees for Prediction: Interpretable and Adaptive Model Selection for Machine Learning [5.877778007271621]
We introduce a tree-based approach, Optimal Predictive-Policy Trees (OP2T), that yields interpretable policies for adaptively selecting a predictive model or ensemble.
Our approach enables interpretable and adaptive model selection and rejection while only assuming access to model outputs.
We evaluate our approach on real-world datasets, including regression and classification tasks with both structured and unstructured data.
arXiv Detail & Related papers (2024-05-30T21:21:33Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Deep Explainable Learning with Graph Based Data Assessing and Rule
Reasoning [4.369058206183195]
We propose an end-to-end deep explainable learning approach that combines the advantage of deep model in noise handling and expert rule-based interpretability.
The proposed method is tested in an industry production system, showing comparable prediction accuracy, much higher generalization stability and better interpretability.
arXiv Detail & Related papers (2022-11-09T05:58:56Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - An exact counterfactual-example-based approach to tree-ensemble models
interpretability [0.0]
High-performance models do not exhibit the necessary transparency to make their decisions fully understandable.
We could derive an exact geometrical characterisation of their decision regions under the form of a collection of multidimensional intervals.
An adaptation to reasoning on regression problems is also envisaged.
arXiv Detail & Related papers (2021-05-31T09:32:46Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - How do Decisions Emerge across Layers in Neural Models? Interpretation
with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability.
Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers.
This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.