Improving LIME Robustness with Smarter Locality Sampling
- URL: http://arxiv.org/abs/2006.12302v3
- Date: Sun, 21 Mar 2021 11:13:50 GMT
- Title: Improving LIME Robustness with Smarter Locality Sampling
- Authors: Sean Saito, Eugene Chua, Nicholas Capel, Rocco Hu
- Abstract summary: We propose to make LIME more robust by training a generative adversarial network to sample more realistic synthetic data.
Our experiments demonstrate an increase in accuracy across three real-world datasets in detecting biased, adversarial behavior.
This is achieved while maintaining comparable explanation quality, with up to 99.94% in top-1 accuracy in some cases.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainability algorithms such as LIME have enabled machine learning systems
to adopt transparency and fairness, which are important qualities in commercial
use cases. However, recent work has shown that LIME's naive sampling strategy
can be exploited by an adversary to conceal biased, harmful behavior. We
propose to make LIME more robust by training a generative adversarial network
to sample more realistic synthetic data which the explainer uses to generate
explanations. Our experiments demonstrate that our proposed method demonstrates
an increase in accuracy across three real-world datasets in detecting biased,
adversarial behavior compared to vanilla LIME. This is achieved while
maintaining comparable explanation quality, with up to 99.94\% in top-1
accuracy in some cases.
Related papers
- LUNAR: LLM Unlearning via Neural Activation Redirection [20.60687563657169]
Large Language Models (LLMs) benefit from training on ever larger amounts of textual data, but they increasingly incur the risk of leaking private information.
We propose LUNAR, a novel unlearning methodology grounded in the Linear Representation Hypothesis.
We show that LUNAR achieves state-of-the-art unlearning performance while significantly enhancing the controllability of the unlearned model during inference.
arXiv Detail & Related papers (2025-02-11T03:23:22Z) - Curriculum-style Data Augmentation for LLM-based Metaphor Detection [7.4594050203808395]
We propose a method for metaphor detection by fine-tuning open-source LLMs.
Our method achieves state-of-the-art performance across all baselines.
arXiv Detail & Related papers (2024-12-04T02:05:21Z) - Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning [49.417414031031264]
This paper studies learning fair encoders in a self-supervised learning setting.
All data are unlabeled and only a small portion of them are annotated with sensitive attributes.
arXiv Detail & Related papers (2024-06-09T08:11:12Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Boosting Disfluency Detection with Large Language Model as Disfluency Generator [8.836888435915077]
We propose a lightweight data augmentation approach for disfluency detection.
We leverage large language model (LLM) to generate disfluent sentences as augmentation data.
We apply an uncertainty-aware data filtering approach to improve the quality of the generated sentences.
arXiv Detail & Related papers (2024-03-13T04:14:33Z) - Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Fair Supervised Learning with A Simple Random Sampler of Sensitive
Attributes [13.988497790151651]
This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning.
We build a computationally efficient group-level in-processing fairness-aware training framework.
Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods.
arXiv Detail & Related papers (2023-11-10T04:38:13Z) - MMD-B-Fair: Learning Fair Representations with Statistical Testing [4.669892068997491]
We introduce a method, MMD-B-Fair, to learn fair representations of data via kernel two-sample testing.
We find neural features of our data where a maximum mean discrepancy (MMD) test cannot distinguish between representations of different sensitive groups, while preserving information about the target attributes.
arXiv Detail & Related papers (2022-11-15T05:25:38Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.