Related papers: A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data

A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data

URL: http://arxiv.org/abs/2107.04680v1
Date: Fri, 9 Jul 2021 21:06:03 GMT
Title: A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data
Authors: Raphael Mazzine and David Martens
Abstract summary: Counterfactual explanations are viewed as an effective way to explain machine learning predictions. There are already dozens of algorithms aiming to generate such explanations. benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using 9 relevant evaluation metrics. Second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms. Third, a set of objective metrics to evaluate and compare counterfactual results. And finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there's no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score and factual point specificities.

Related papers

A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner. exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods. We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z)
Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms. The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Using Knowledge Graphs for Performance Prediction of Modular Optimization Algorithms [4.060078409841919]
We build a performance prediction model using a knowledge graph embedding-based methodology. We show that a triple classification approach can correctly predict whether a given algorithm instance will be able to achieve a certain target precision.
arXiv Detail & Related papers (2023-01-24T09:28:57Z)
Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Early Time-Series Classification Algorithms: An Empirical Comparison [59.82930053437851]
Early Time-Series Classification (ETSC) is the task of predicting the class of incoming time-series by observing as few measurements as possible. We evaluate six existing ETSC algorithms on publicly available data, as well as on two newly introduced datasets.
arXiv Detail & Related papers (2022-03-03T10:43:56Z)
Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers [6.605210393590192]
DIverse and GENerative ML Benchmark (DIGEN) is a collection of synthetic datasets for benchmarking of machine learning algorithms. The resource with extensive documentation and analyses is open-source and available on GitHub.
arXiv Detail & Related papers (2021-07-14T03:58:02Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z)
Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance [0.0]
Clustering points in a vector space or nodes in a graph is a ubiquitous primitive in statistical data analysis. We focus on principled algorithms for this cluster improvement problem. We develop efficient implementations of these algorithms in our LocalGraphClustering Python package.
arXiv Detail & Related papers (2020-04-20T20:14:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.