Related papers: Promises and Pitfalls of Black-Box Concept Learning Models

Promises and Pitfalls of Black-Box Concept Learning Models

URL: http://arxiv.org/abs/2106.13314v1
Date: Thu, 24 Jun 2021 21:00:28 GMT
Title: Promises and Pitfalls of Black-Box Concept Learning Models
Authors: Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, Weiwei Pan
Abstract summary: We show that machine learning models that incorporate concept learning encode information beyond the pre-defined concepts. Natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading.
Score: 26.787383014558802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

Related papers

MCCE: Missingness-aware Causal Concept Explainer [4.56242146925245]
We introduce the Missingness-aware Causal Concept Explainer (MCCE) to estimate causal concept effects when not all concepts are observable. Our framework learns to account for residual bias resulting from missing concepts and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models. We conduct validations using a real-world dataset, demonstrating that MCCE achieves promising performance compared to state-of-the-art explanation methods in causal concept effect estimation.
arXiv Detail & Related papers (2024-11-14T18:03:44Z)
MulCPred: Learning Multi-modal Concepts for Explainable Pedestrian Action Prediction [57.483718822429346]
MulCPred is proposed that explains its predictions based on multi-modal concepts represented by training samples. MulCPred is evaluated on multiple datasets and tasks.
arXiv Detail & Related papers (2024-09-14T14:15:28Z)
Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs) We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way. ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z)
Do Concept Bottleneck Models Respect Localities? [14.77558378567965]
Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models.<n>We assess whether concept predictors leverage "relevant" features to make predictions, a term we call locality.<n>We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts.
arXiv Detail & Related papers (2024-01-02T16:05:23Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc Explanation [11.820167569334444]
This paper introduces the Concept Bottleneck Surrogate Models (SurroCBM) to explain black-box models. SurroCBM identifies shared and unique concepts across various black-box models and employs an explainable surrogate model for post-hoc explanations. An effective training strategy using self-generated data is proposed to enhance explanation quality continuously.
arXiv Detail & Related papers (2023-10-11T17:46:59Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
ConceptDistil: Model-Agnostic Distillation of Concept Explanations [4.462334751640166]
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
arXiv Detail & Related papers (2022-05-07T08:58:54Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
Debiasing Concept-based Explanations with Causal Analysis [4.911435444514558]
We study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables. We show that our debiasing method works when the concepts are not complete.
arXiv Detail & Related papers (2020-07-22T15:42:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.