Related papers: Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models

Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models

URL: http://arxiv.org/abs/2506.02092v1
Date: Mon, 02 Jun 2025 16:26:41 GMT
Title: Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models
Authors: Francesco De Santis, Philippe Bich, Gabriele Ciravegna, Pietro Barbiero, Danilo Giordano, Tania Cerquitelli,
Abstract summary: This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM)<n>We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models.<n>Despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.
Score: 9.340843984411137
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: To increase the trustworthiness of deep neural networks, it is critical to improve the understanding of how they make decisions. This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM) which models concepts as random variables within a Bernoulli latent space. Unlike traditional methods that either require extensive human supervision or suffer from limited scalability, our approach employs a reduced number of concepts without sacrificing performance. We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models. The proposed concept representation enhances information retention and aligns more closely with human understanding. A user study demonstrates the discovered concepts are also more intuitive for humans to interpret. Finally, despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.

Related papers

Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z)
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM) It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z)
Concept Bottleneck Models Without Predefined Concepts [26.156636891713745]
We introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes. We show that our approach improves downstream performance and narrows the performance gap to black-box models.
arXiv Detail & Related papers (2024-07-04T13:34:50Z)
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.<n>We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs) We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z)
Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models. The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure. Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z)
Auxiliary Losses for Learning Generalizable Concept-based Models [5.4066453042367435]
Concept Bottleneck Models (CBMs) have gained popularity since their introduction. CBMs essentially limit the latent space of a model to human-understandable high-level concepts. We propose cooperative-Concept Bottleneck Model (coop-CBM) to overcome the performance trade-off.
arXiv Detail & Related papers (2023-11-18T15:50:07Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Sparse Linear Concept Discovery Models [11.138948381367133]
Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts. We propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. We experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity.
arXiv Detail & Related papers (2023-08-21T15:16:19Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Concept Bottleneck Model with Additional Unsupervised Concepts [0.5939410304994348]
We propose a novel interpretable model based on the concept bottleneck model (CBM) CBM uses concept labels to train an intermediate layer as the additional visible layer. By seamlessly training these two types of concepts while reducing the amount of computation, we can obtain both supervised and unsupervised concepts simultaneously.
arXiv Detail & Related papers (2022-02-03T08:30:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.