Synthetic Benchmarks for Scientific Research in Explainable Machine
Learning
- URL: http://arxiv.org/abs/2106.12543v1
- Date: Wed, 23 Jun 2021 17:10:21 GMT
- Title: Synthetic Benchmarks for Scientific Research in Explainable Machine
Learning
- Authors: Yang Liu, Sujay Khandagale, Colin White, Willie Neiswanger
- Abstract summary: We release XAI-Bench: a suite of synthetic datasets and a library for benchmarking feature attribution algorithms.
Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values.
We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and identifying failure modes for popular explainers.
- Score: 14.172740234933215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning models grow more complex and their applications become
more high-stakes, tools for explaining model predictions have become
increasingly important. Despite the widespread use of explainability
techniques, evaluating and comparing different feature attribution methods
remains challenging: evaluations ideally require human studies, and empirical
evaluation metrics are often computationally prohibitive on real-world
datasets. In this work, we address this issue by releasing XAI-Bench: a suite
of synthetic datasets along with a library for benchmarking feature attribution
algorithms. Unlike real-world datasets, synthetic datasets allow the efficient
computation of conditional expected values that are needed to evaluate
ground-truth Shapley values and other metrics. The synthetic datasets we
release offer a wide variety of parameters that can be configured to simulate
real-world data. We demonstrate the power of our library by benchmarking
popular explainability techniques across several evaluation metrics and
identifying failure modes for popular explainers. The efficiency of our library
will help bring new explainability methods from development to deployment.
Related papers
- EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs.
Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences.
We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data [3.360001542033098]
SynthEval is a novel open-source evaluation framework for synthetic data.
It treats categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps.
Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity.
arXiv Detail & Related papers (2024-04-24T11:49:09Z) - Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition [0.2775636978045794]
We study the drift between the performance of models trained on real and synthetic datasets.
We conduct studies on the differences between real and synthetic datasets on the attribute set.
Interestingly enough, we have verified that while real samples suffice to explain the synthetic distribution, the opposite could not be further from being true.
arXiv Detail & Related papers (2024-04-23T17:10:49Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Is Synthetic Dataset Reliable for Benchmarking Generalizable Person
Re-Identification? [1.1041211464412568]
We show that a recent large-scale synthetic dataset ClonedPerson can be reliably used to benchmark GPReID, statistically the same as real-world datasets.
This study guarantees the usage of synthetic datasets for both source training set and target testing set, with completely no privacy concerns from real-world surveillance data.
arXiv Detail & Related papers (2022-09-12T06:54:54Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Foundations of Bayesian Learning from Synthetic Data [1.6249267147413522]
We use a Bayesian paradigm to characterise the updating of model parameters when learning on synthetic data.
Recent results from general Bayesian updating support a novel and robust approach to synthetic-learning founded on decision theory.
arXiv Detail & Related papers (2020-11-16T21:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.