Related papers: Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization

Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization

URL: http://arxiv.org/abs/2504.18026v3
Date: Thu, 05 Jun 2025 03:06:29 GMT
Title: Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Authors: Emiliano Penaloza, Tianyue H. Zhan, Laurent Charlin, Mateo Espinosa Zarlenga,
Abstract summary: Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human-understandable concepts.<n>CBMs typically assume that datasets contain accurate concept labels, which can significantly degrade performance.<n>We introduce the Concept Preference Optimization (CPO) objective, which effectively mitigates the negative impact of concept mislabeling on CBM performance.
Score: 5.822390655999343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness of AI systems by constraining their decisions on a set of human-understandable concepts. However, CBMs typically assume that datasets contain accurate concept labels-an assumption often violated in practice, which we show can significantly degrade performance (by 25% in some cases). To address this, we introduce the Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates the negative impact of concept mislabeling on CBM performance. We provide an analysis of key properties of the CPO objective, showing it directly optimizes for the concept's posterior distribution, and contrast it against Binary Cross Entropy (BCE), demonstrating that CPO is inherently less sensitive to concept noise. We empirically confirm our analysis by finding that CPO consistently outperforms BCE on three real-world datasets, both with and without added label noise. We make our code available on Github.

Related papers

Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models [15.97013792698305]
Concept Bottleneck Models (CBMs) decompose image classification into a process governed by interpretable, human-readable concepts.<n>We introduce a dynamic, agent-based approach that adjusts the concept bank in response to environmental feedback.<n>We also propose Conditional Concept Bottleneck Models (CoCoBMs) to overcome the limitations in traditional CBMs' concept scoring mechanisms.
arXiv Detail & Related papers (2025-06-02T05:25:52Z)
V-CEM: Bridging Performance and Intervenability in Concept-based Models [6.617167508694296]
Concept-based AI (C-XAI) is a rapidly growing research field that enhances AI model interpretability by leveraging intermediate, human-understandable concepts. CBMs explicitly predict concepts before making final decisions, enabling interventions to correct misclassified concepts. CBMs remain effective in Out-Of-Distribution (OOD) settings with intervention, but they struggle to match the performance of black-box models. We propose the Variational Concept Embedding Model (V-CEM), which leverages variational inference to improve intervention responsiveness in CEMs.
arXiv Detail & Related papers (2025-04-04T22:43:04Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Adaptive Test-Time Intervention for Concept Bottleneck Models [6.31833744906105]
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts"<n>We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD)<n>FIGS-BD distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model.
arXiv Detail & Related papers (2025-03-09T19:03:48Z)
Concept-driven Off Policy Evaluation [2.789652596206117]
We develop a family of concept-based OPE estimators, proving that they remain unbiased and reduce variance when concepts are known and predefined.<n>Experiments with synthetic and real-world datasets show that both known and learned concept-based estimators significantly improve OPE performance.<n>Unlike other OPE methods, concept-based estimators are easily interpretable and allow for targeted interventions on specific concepts, further enhancing the quality of these estimators.
arXiv Detail & Related papers (2024-11-28T22:15:06Z)
EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors [4.481898130085069]
Concept bottleneck models (CBMs) have gained attention as an effective approach by leveraging human-understandable concepts to enhance interpretability. Existing CBMs face challenges due to deterministic concept encoding and reliance on inconsistent concepts, leading to inaccuracies. We propose EQ-CBM, a novel framework that enhances CBMs through probabilistic concept encoding.
arXiv Detail & Related papers (2024-09-22T23:43:45Z)
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z)
Stochastic Concept Bottleneck Models [8.391254800873599]
Concept Bottleneck Models (CBMs) have emerged as a promising interpretable method whose final prediction is based on human-understandable concepts. We propose Concept Bottleneck Models (SCBMs), a novel approach that models concept dependencies. A single-concept intervention affects all correlated concepts, thereby improving intervention effectiveness.
arXiv Detail & Related papers (2024-06-27T15:38:37Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Sparse Linear Concept Discovery Models [11.138948381367133]
Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts. We propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. We experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity.
arXiv Detail & Related papers (2023-08-21T15:16:19Z)
Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW) We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z)
Explainable fetal ultrasound quality assessment with progressive concept bottleneck models [6.734637459963132]
We propose a holistic and explainable method for fetal ultrasound quality assessment.<n>We introduce human-readable concepts" into the task and imitate the sequential expert decision-making process.<n> Experiments show that our model outperforms equivalent concept-free models on an in-house dataset.
arXiv Detail & Related papers (2022-11-19T09:31:19Z)
Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations. We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.