Benchmarking and Enhancing Disentanglement in Concept-Residual Models
- URL: http://arxiv.org/abs/2312.00192v1
- Date: Thu, 30 Nov 2023 21:07:26 GMT
- Title: Benchmarking and Enhancing Disentanglement in Concept-Residual Models
- Authors: Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon
Stepputtis, Katia Sycara
- Abstract summary: Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features.
CBMs' performance depends on the engineered features and can severely suffer from incomplete sets of concepts.
This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals.
- Score: 4.177318966048984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept bottleneck models (CBMs) are interpretable models that first predict
a set of semantically meaningful features, i.e., concepts, from observations
that are subsequently used to condition a downstream task. However, the model's
performance strongly depends on the engineered features and can severely suffer
from incomplete sets of concepts. Prior works have proposed a side channel -- a
residual -- that allows for unconstrained information flow to the downstream
task, thus improving model performance but simultaneously introducing
information leakage, which is undesirable for interpretability. This work
proposes three novel approaches to mitigate information leakage by
disentangling concepts and residuals, investigating the critical balance
between model performance and interpretability. Through extensive empirical
analysis on the CUB, OAI, and CIFAR 100 datasets, we assess the performance of
each disentanglement method and provide insights into when they work best.
Further, we show how each method impacts the ability to intervene over the
concepts and their subsequent impact on task performance.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort [31.992947353231564]
Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts.
We propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations.
We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.
arXiv Detail & Related papers (2024-07-12T03:07:28Z) - Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Consistent Explanations in the Face of Model Indeterminacy via
Ensembling [12.661530681518899]
This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy.
We introduce ensemble methods to enhance the consistency of the explanations provided in these scenarios.
Our findings highlight the importance of considering model indeterminacy when interpreting explanations.
arXiv Detail & Related papers (2023-06-09T18:45:43Z) - Sparse Relational Reasoning with Object-Centric Representations [78.83747601814669]
We investigate the composability of soft-rules learned by relational neural architectures when operating over object-centric representations.
We find that increasing sparsity, especially on features, improves the performance of some models and leads to simpler relations.
arXiv Detail & Related papers (2022-07-15T14:57:33Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Inducing Semantic Grouping of Latent Concepts for Explanations: An
Ante-Hoc Approach [18.170504027784183]
We show that by exploiting latent and properly modifying different parts of the model can result better explanation as well as provide superior predictive performance.
We also proposed a technique of using two different self-supervision techniques to extract meaningful concepts related to the type of self-supervision considered.
arXiv Detail & Related papers (2021-08-25T07:09:57Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.