CAFE: Conflict-Aware Feature-wise Explanations
- URL: http://arxiv.org/abs/2310.20363v1
- Date: Tue, 31 Oct 2023 11:14:26 GMT
- Title: CAFE: Conflict-Aware Feature-wise Explanations
- Authors: Adam Dejl, Hamed Ayoobi, Matthew Williams, Francesca Toni
- Abstract summary: Feature attribution methods are widely used to explain neural models by determining the influence of individual input features on the models' outputs.
We propose a novel feature attribution method, CAFE (Conflict-Aware Feature-wise Explanations), that addresses three limitations of the existing methods.
- Score: 12.428277452418621
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Feature attribution methods are widely used to explain neural models by
determining the influence of individual input features on the models' outputs.
We propose a novel feature attribution method, CAFE (Conflict-Aware
Feature-wise Explanations), that addresses three limitations of the existing
methods: their disregard for the impact of conflicting features, their lack of
consideration for the influence of bias terms, and an overly high sensitivity
to local variations in the underpinning activation functions. Unlike other
methods, CAFE provides safeguards against overestimating the effects of neuron
inputs and separately traces positive and negative influences of input features
and biases, resulting in enhanced robustness and increased ability to surface
feature conflicts. We show experimentally that CAFE is better able to identify
conflicting features on synthetic tabular data and exhibits the best overall
fidelity on several real-world tabular datasets, while being highly
computationally efficient.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Disentangle Estimation of Causal Effects from Cross-Silo Data [14.684584362172666]
We introduce an innovative disentangle architecture designed to facilitate the seamless cross-silo transmission of model parameters.
We introduce global constraints into the equation to effectively mitigate bias within the various missing domains.
Our method outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-01-04T09:05:37Z) - Understanding Robust Overfitting from the Feature Generalization Perspective [61.770805867606796]
Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data.
It is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness.
In this paper, we investigate RO from a novel feature generalization perspective.
arXiv Detail & Related papers (2023-10-01T07:57:03Z) - Decomposing Global Feature Effects Based on Feature Interactions [10.874932625841257]
Generalized additive decomposition of global effects (GADGET) is a new framework for finding interpretable regions in the feature space.
We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects.
We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings.
arXiv Detail & Related papers (2023-06-01T10:51:12Z) - Learning Infomax and Domain-Independent Representations for Causal
Effect Inference with Real-World Data [9.601837205635686]
We learn the Infomax and Domain-Independent Representations to solve the above puzzles.
We show that our method achieves state-of-the-art performance on causal effect inference.
arXiv Detail & Related papers (2022-02-22T13:35:15Z) - Bringing a Ruler Into the Black Box: Uncovering Feature Impact from
Individual Conditional Expectation Plots [0.0]
We introduce a model-agnostic, performance-agnostic feature impact metric drawn out from ICE plots.
We also introduce an in-distribution variant of ICE feature impact to vary the influence of out-of-distribution points.
We demonstrate ICE feature impact's utility in several tasks using real-world data.
arXiv Detail & Related papers (2021-09-06T20:26:29Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Demarcating Endogenous and Exogenous Opinion Dynamics: An Experimental
Design Approach [27.975266406080152]
In this paper, we design a suite of unsupervised classification methods based on experimental design approaches.
We aim to select the subsets of events which minimize different measures of mean estimation error.
Our experiments range from validating prediction performance on unsanitized and sanitized events to checking the effect of selecting optimal subsets of various sizes.
arXiv Detail & Related papers (2021-02-11T11:38:15Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.