Related papers: Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

URL: http://arxiv.org/abs/2504.17921v1
Date: Thu, 24 Apr 2025 20:24:31 GMT
Title: Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts
Authors: Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik,
Abstract summary: We investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs.<n>We introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution.
Score: 10.806525657355872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM's mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Related papers

LLM Pretraining with Continuous Concepts [71.98047075145249]
Next token prediction has been the standard training objective used in large language model pretraining. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts.
arXiv Detail & Related papers (2025-02-12T16:00:11Z)
What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z)
The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness. We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z)
Tree-Based Leakage Inspection and Control in Concept Bottleneck Models [3.135289953462274]
Concept Bottleneck Models (CBMs) have gained attention for enhancing interpretability by mapping inputs to intermediate concepts before making final predictions. CBMs often suffer from information leakage, where additional input data, not captured by the concepts, is used to improve task performance. We introduce a novel approach for training both joint and sequential CBMs that allows us to identify and control leakage using decision trees.
arXiv Detail & Related papers (2024-10-08T20:42:19Z)
MulCPred: Learning Multi-modal Concepts for Explainable Pedestrian Action Prediction [57.483718822429346]
MulCPred is proposed that explains its predictions based on multi-modal concepts represented by training samples. MulCPred is evaluated on multiple datasets and tasks.
arXiv Detail & Related papers (2024-09-14T14:15:28Z)
On the Concept Trustworthiness in Concept Bottleneck Models [39.928868605678744]
Concept Bottleneck Models (CBMs) break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction. Despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept remains a black box. A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts are derived from relevant regions. An enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map.
arXiv Detail & Related papers (2024-03-21T12:24:53Z)
Eliminating Information Leakage in Hard Concept Bottleneck Models with Supervised, Hierarchical Concept Learning [17.982131928413096]
Concept Bottleneck Models (CBMs) aim to deliver interpretable and interventionable predictions by bridging features and labels with human-understandable concepts. CBMs suffer from information leakage, where unintended information beyond the concepts are leaked to the subsequent label prediction. This paper proposes a new paradigm of CBMs, namely SupCBM, which achieves label predication via predicted concepts and a deliberately-designed intervention matrix.
arXiv Detail & Related papers (2024-02-03T03:50:58Z)
Do Concept Bottleneck Models Respect Localities? [14.77558378567965]
Concept-based methods explain model predictions using human-understandable concepts. "Localities" involve using only relevant features when predicting a concept's value. CBMs may not capture localities, even when independent concepts are localised to non-overlapping feature subsets.
arXiv Detail & Related papers (2024-01-02T16:05:23Z)
Causality-aware Concept Extraction based on Knowledge-guided Prompting [17.4086571624748]
Concepts benefit natural language understanding but are far from complete in existing knowledge graphs (KGs) Recently, pre-trained language models (PLMs) have been widely used in text-based concept extraction. We propose equipping the PLM-based extractor with a knowledge-guided prompt as an intervention to alleviate concept bias.
arXiv Detail & Related papers (2023-05-03T03:36:20Z)
Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision. Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z)
Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application. We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data. We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z)
On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data. It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.