Related papers: CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

URL: http://arxiv.org/abs/2108.00783v1
Date: Mon, 2 Aug 2021 11:00:43 GMT
Title: CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
Authors: Martin Pawelczyk and Sascha Bielawski and Johannes van den Heuvel and Tobias Richter and Gjergji Kasneci
Abstract summary: CARLA (Counterfactual And Recourse LibrAry) is a python library for benchmarking counterfactual explanation methods. We provide an extensive benchmark of 11 popular counterfactual explanation methods. We also provide a benchmarking framework for research on future counterfactual explanation methods.
Score: 6.133522864509327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations provide means for prescriptive model explanations by suggesting actionable feature changes (e.g., increase income) that allow individuals to achieve favorable outcomes in the future (e.g., insurance approval). Choosing an appropriate method is a crucial aspect for meaningful counterfactual explanations. As documented in recent reviews, there exists a quickly growing literature with available methods. Yet, in the absence of widely available opensource implementations, the decision in favor of certain models is primarily based on what is readily available. Going forward - to guarantee meaningful comparisons across explanation methods - we present CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models. In summary, our work provides the following contributions: (i) an extensive benchmark of 11 popular counterfactual explanation methods, (ii) a benchmarking framework for research on future counterfactual explanation methods, and (iii) a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons of these methods. We have open-sourced CARLA and our experimental results on Github, making them available as competitive baselines. We welcome contributions from other research groups and practitioners.

Related papers

Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER) DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process. Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
ir_explain: a Python Library of Explainable IR Methods [2.6746131626710725]
irexplain is a Python library that implements a variety of techniques for Explainable IR (ExIR) within a common framework. irexplain supports the three standard categories of post-hoc explanations, namely pointwise, pairwise, and listwise explanations. The library is designed to make it easy to reproduce state-of-the-art ExIR baselines on standard test collections.
arXiv Detail & Related papers (2024-04-29T09:37:24Z)
RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models [48.895510739010355]
We present three key contributions to address this gap. First, we rigorously define listwise feature attribution for ranking models. Second, we introduce RankingSHAP, extending the popular SHAP framework to accommodate listwise ranking attribution. Third, we propose two novel evaluation paradigms for assessing the faithfulness of attributions in learning-to-rank models.
arXiv Detail & Related papers (2024-03-24T10:45:55Z)
Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL) In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent. We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z)
Unlocking the Potential of Large Language Models for Explainable Recommendations [55.29843710657637]
It remains uncertain what impact replacing the explanation generator with the recently emerging large language models (LLMs) would have. In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework. By adopting several key fine-tuning techniques, controllable and fluent explanations can be well generated.
arXiv Detail & Related papers (2023-12-25T09:09:54Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
counterfactuals: An R Package for Counterfactual Explanation Methods [9.505961054570523]
We introduce the counterfactuals R package, which provides a modular and unified interface for counterfactual explanation methods. We implement three existing counterfactual explanation methods and propose some optional methodological extensions. We show how to integrate additional counterfactual explanation methods into the package.
arXiv Detail & Related papers (2023-04-13T14:29:15Z)
OpenXAI: Towards a Transparent Evaluation of Model Explanations [41.37573216839717]
We introduce OpenXAI, a comprehensive and open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods.
arXiv Detail & Related papers (2022-06-22T14:01:34Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data [0.0]
Counterfactual explanations are viewed as an effective way to explain machine learning predictions. There are already dozens of algorithms aiming to generate such explanations. benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context.
arXiv Detail & Related papers (2021-07-09T21:06:03Z)
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z)
Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z)
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability. Clear evidence of method effectiveness is found in very few cases. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.