Example-based Explanations for Random Forests using Machine Unlearning
- URL: http://arxiv.org/abs/2402.05007v1
- Date: Wed, 7 Feb 2024 16:28:04 GMT
- Title: Example-based Explanations for Random Forests using Machine Unlearning
- Authors: Tanmay Surve and Romila Pradhan
- Abstract summary: Tree-based machine learning models, such as decision trees and random forests, have been hugely successful in classification tasks.
Despite their popularity and power, these models have been found to produce unexpected or discriminatory outcomes.
We introduce Fair Debugger, a system to identify training data subsets responsible for instances of fairness violations in the outcomes of a random forest classifier.
- Score: 4.006745047019997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tree-based machine learning models, such as decision trees and random
forests, have been hugely successful in classification tasks primarily because
of their predictive power in supervised learning tasks and ease of
interpretation. Despite their popularity and power, these models have been
found to produce unexpected or discriminatory outcomes. Given their
overwhelming success for most tasks, it is of interest to identify sources of
their unexpected and discriminatory behavior. However, there has not been much
work on understanding and debugging tree-based classifiers in the context of
fairness.
We introduce FairDebugger, a system that utilizes recent advances in machine
unlearning research to identify training data subsets responsible for instances
of fairness violations in the outcomes of a random forest classifier.
FairDebugger generates top-$k$ explanations (in the form of coherent training
data subsets) for model unfairness. Toward this goal, FairDebugger first
utilizes machine unlearning to estimate the change in the tree structures of
the random forest when parts of the underlying training data are removed, and
then leverages the Apriori algorithm from frequent itemset mining to reduce the
subset search space. We empirically evaluate our approach on three real-world
datasets, and demonstrate that the explanations generated by FairDebugger are
consistent with insights from prior studies on these datasets.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects.
This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - Interpretable Data-Based Explanations for Fairness Debugging [7.266116143672294]
Gopher is a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior.
We introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias.
Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias.
arXiv Detail & Related papers (2021-12-17T20:10:00Z) - Fairness without Imputation: A Decision Tree Approach for Fair
Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values.
We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning.
We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Discriminative, Generative and Self-Supervised Approaches for
Target-Agnostic Learning [8.666667951130892]
generative and self-supervised learning models are shown to perform well at the task.
Our derived theorem for the pseudo-likelihood theory also shows that they are related for inferring a joint distribution model.
arXiv Detail & Related papers (2020-11-12T15:03:40Z) - Towards Robust Classification with Deep Generative Forests [13.096855747795303]
Decision Trees and Random Forests are among the most widely used machine learning models.
Being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions.
We exploit Generative Forests (GeFs) to extend Random Forests to generative models representing the full joint distribution over the feature space.
arXiv Detail & Related papers (2020-07-11T08:57:52Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.