Concept Bottleneck Models
- URL: http://arxiv.org/abs/2007.04612v3
- Date: Tue, 29 Dec 2020 00:32:21 GMT
- Title: Concept Bottleneck Models
- Authors: Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma
Pierson, Been Kim, Percy Liang
- Abstract summary: State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs"
We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label.
On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models.
- Score: 79.91795150047804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We seek to learn models that we can interact with using high-level concepts:
if the model did not think there was a bone spur in the x-ray, would it still
predict severe arthritis? State-of-the-art models today do not typically
support the manipulation of concepts like "the existence of bone spurs", as
they are trained end-to-end to go directly from raw input (e.g., pixels) to
output (e.g., arthritis severity). We revisit the classic idea of first
predicting concepts that are provided at training time, and then using these
concepts to predict the label. By construction, we can intervene on these
concept bottleneck models by editing their predicted concept values and
propagating these changes to the final prediction. On x-ray grading and bird
identification, concept bottleneck models achieve competitive accuracy with
standard end-to-end models, while enabling interpretation in terms of
high-level clinical concepts ("bone spurs") or bird attributes ("wing color").
These models also allow for richer human-model interaction: accuracy improves
significantly if we can correct model mistakes on concepts at test time.
Related papers
- Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Towards Concept-based Interpretability of Skin Lesion Diagnosis using
Vision-Language Models [0.0]
We show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples.
In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings.
arXiv Detail & Related papers (2023-11-24T08:31:34Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - A Lightweight Generative Model for Interpretable Subject-level Prediction [0.07989135005592125]
We propose a technique for single-subject prediction that is inherently interpretable.
Experiments demonstrate that the resulting model can be efficiently inverted to make accurate subject-level predictions.
arXiv Detail & Related papers (2023-06-19T18:20:29Z) - Prototype Learning for Explainable Brain Age Prediction [1.104960878651584]
We present ExPeRT, an explainable prototype-based model specifically designed for regression tasks.
Our proposed model makes a sample prediction from the distances to a set of learned prototypes in latent space, using a weighted mean of prototype labels.
Our approach achieved state-of-the-art prediction performance while providing insight into the model's reasoning process.
arXiv Detail & Related papers (2023-06-16T14:13:21Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away.
Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z) - Interpretable Companions for Black-Box Models [13.39487972552112]
We present an interpretable companion model for any pre-trained black-box classifiers.
For any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy.
The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity.
arXiv Detail & Related papers (2020-02-10T01:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.