DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
- URL: http://arxiv.org/abs/2408.00331v1
- Date: Thu, 1 Aug 2024 07:08:11 GMT
- Title: DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
- Authors: Rakshith Subramanyam, Kowshik Thopalli, Vivek Narayanaswamy, Jayaraman J. Thiagarajan,
- Abstract summary: We propose DECIDER, a novel approach that leverages priors from large language models (LLMs) and vision-language models (VLMs) to detect failures in image models.
DECIDER consistently achieves state-of-the-art failure detection performance, significantly outperforming baselines in terms of the overall Matthews correlation coefficient.
- Score: 18.77296551727931
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reliably detecting when a deployed machine learning model is likely to fail on a given input is crucial for ensuring safe operation. In this work, we propose DECIDER (Debiasing Classifiers to Identify Errors Reliably), a novel approach that leverages priors from large language models (LLMs) and vision-language models (VLMs) to detect failures in image classification models. DECIDER utilizes LLMs to specify task-relevant core attributes and constructs a ``debiased'' version of the classifier by aligning its visual features to these core attributes using a VLM, and detects potential failure by measuring disagreement between the original and debiased models. In addition to proactively identifying samples on which the model would fail, DECIDER also provides human-interpretable explanations for failure through a novel attribute-ablation strategy. Through extensive experiments across diverse benchmarks spanning subpopulation shifts (spurious correlations, class imbalance) and covariate shifts (synthetic corruptions, domain shifts), DECIDER consistently achieves state-of-the-art failure detection performance, significantly outperforming baselines in terms of the overall Matthews correlation coefficient as well as failure and success recall. Our codes can be accessed at~\url{https://github.com/kowshikthopalli/DECIDER/}
Related papers
- DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models [6.369258625916601]
Post-hoc interpretability methods fail to capture the models' decision-making process fully.
Our paper introduces DISCO, a novel method for discovering global, rule-based explanations.
DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output.
arXiv Detail & Related papers (2024-11-07T12:12:44Z) - Towards Robust Text Classification: Mitigating Spurious Correlations with Causal Learning [2.7813683000222653]
We propose the Causally Calibrated Robust ( CCR) to reduce models' reliance on spurious correlations.
CCR integrates a causal feature selection method based on counterfactual reasoning, along with an inverse propensity weighting (IPW) loss function.
We show that CCR state-of-the-art performance among methods without group labels, and in some cases, it can compete with the models that utilize group labels.
arXiv Detail & Related papers (2024-11-01T21:29:07Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - D-CALM: A Dynamic Clustering-based Active Learning Approach for
Mitigating Bias [13.008323851750442]
In this paper, we propose a novel adaptive clustering-based active learning algorithm, D-CALM, that dynamically adjusts clustering and annotation efforts.
Experiments on eight datasets for a diverse set of text classification tasks, including emotion, hatespeech, dialog act, and book type detection, demonstrate that our proposed algorithm significantly outperforms baseline AL approaches.
arXiv Detail & Related papers (2023-05-26T15:17:43Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Estimating the Robustness of Classification Models by the Structure of
the Learned Feature-Space [10.418647759223964]
We argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions.
To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space.
arXiv Detail & Related papers (2021-06-23T10:52:29Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.