Related papers: Example-based Explanations for Random Forests using Machine Unlearning

Example-based Explanations for Random Forests using Machine Unlearning

URL: http://arxiv.org/abs/2402.05007v1
Date: Wed, 7 Feb 2024 16:28:04 GMT
Title: Example-based Explanations for Random Forests using Machine Unlearning
Authors: Tanmay Surve and Romila Pradhan
Abstract summary: Tree-based machine learning models, such as decision trees and random forests, have been hugely successful in classification tasks. Despite their popularity and power, these models have been found to produce unexpected or discriminatory outcomes. We introduce Fair Debugger, a system to identify training data subsets responsible for instances of fairness violations in the outcomes of a random forest classifier.
Score: 4.006745047019997
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tree-based machine learning models, such as decision trees and random forests, have been hugely successful in classification tasks primarily because of their predictive power in supervised learning tasks and ease of interpretation. Despite their popularity and power, these models have been found to produce unexpected or discriminatory outcomes. Given their overwhelming success for most tasks, it is of interest to identify sources of their unexpected and discriminatory behavior. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce FairDebugger, a system that utilizes recent advances in machine unlearning research to identify training data subsets responsible for instances of fairness violations in the outcomes of a random forest classifier. FairDebugger generates top-$k$ explanations (in the form of coherent training data subsets) for model unfairness. Toward this goal, FairDebugger first utilizes machine unlearning to estimate the change in the tree structures of the random forest when parts of the underlying training data are removed, and then leverages the Apriori algorithm from frequent itemset mining to reduce the subset search space. We empirically evaluate our approach on three real-world datasets, and demonstrate that the explanations generated by FairDebugger are consistent with insights from prior studies on these datasets.

Related papers

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased [0.0]
Imbalanced binary classification problems arise in many fields of study. It is common to subsample the majority class to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data.
arXiv Detail & Related papers (2024-12-17T19:38:29Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification. XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning) This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation. The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z)
Interpretable Data-Based Explanations for Fairness Debugging [7.266116143672294]
Gopher is a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior. We introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias.
arXiv Detail & Related papers (2021-12-17T20:10:00Z)
Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values. We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
Discriminative, Generative and Self-Supervised Approaches for Target-Agnostic Learning [8.666667951130892]
generative and self-supervised learning models are shown to perform well at the task. Our derived theorem for the pseudo-likelihood theory also shows that they are related for inferring a joint distribution model.
arXiv Detail & Related papers (2020-11-12T15:03:40Z)
Towards Robust Classification with Deep Generative Forests [13.096855747795303]
Decision Trees and Random Forests are among the most widely used machine learning models. Being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions. We exploit Generative Forests (GeFs) to extend Random Forests to generative models representing the full joint distribution over the feature space.
arXiv Detail & Related papers (2020-07-11T08:57:52Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.