Related papers: HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging

HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging

URL: http://arxiv.org/abs/2501.16751v3
Date: Mon, 03 Mar 2025 09:07:59 GMT
Title: HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
Authors: Muxi Chen, Chenchen Zhao, Qiang Xu,
Abstract summary: HiBug2 is an automated framework for error slice discovery and model repair.<n>It first generates task-specific visual attributes to highlight instances prone to errors.<n>It then employs an efficient slice enumeration algorithm to systematically identify error slices.
Score: 9.209104721371228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the significant success of deep learning models in computer vision, they often exhibit systematic failures on specific data subsets, known as error slices. Identifying and mitigating these error slices is crucial to enhancing model robustness and reliability in real-world scenarios. In this paper, we introduce HiBug2, an automated framework for error slice discovery and model repair. HiBug2 first generates task-specific visual attributes to highlight instances prone to errors through an interpretable and structured process. It then employs an efficient slice enumeration algorithm to systematically identify error slices, overcoming the combinatorial challenges that arise during slice exploration. Additionally, HiBug2 extends its capabilities by predicting error slices beyond the validation set, addressing a key limitation of prior approaches. Extensive experiments across multiple domains, including image classification, pose estimation, and object detection - show that HiBug2 not only improves the coherence and precision of identified error slices but also significantly enhances the model repair capabilities.

Related papers

Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation [0.5242869847419834]
This paper presents a novel method for discovering systematic errors in segmentation models. We leverage multimodal foundation models to retrieve errors and use conceptual linkage along with erroneous nature to study the systematic nature of these errors. Our work opens up the avenue to model analysis and intervention that have so far been underexplored in semantic segmentation.
arXiv Detail & Related papers (2024-11-16T17:31:37Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation [18.77296551727931]
We propose DECIDER, a novel approach that leverages priors from large language models (LLMs) and vision-language models (VLMs) to detect failures in image models. DECIDER consistently achieves state-of-the-art failure detection performance, significantly outperforming baselines in terms of the overall Matthews correlation coefficient.
arXiv Detail & Related papers (2024-08-01T07:08:11Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z)
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection [12.529028629599349]
We propose a novel benchmarking methodology to help researchers better evaluate the true capabilities and limits of ML4VD techniques. Using six ML4VD techniques and two datasets, we find (a) that state-of-the-art models severely overfit to unrelated features for predicting the vulnerabilities in the testing data, and (b) that the performance gained by data augmentation does not generalize beyond the specific augmentations applied during training.
arXiv Detail & Related papers (2023-06-28T08:41:39Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [25.88974494276895]
This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models can be leveraged to automatically find failures. In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs.
arXiv Detail & Related papers (2022-08-18T13:49:10Z)
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging [12.732665048388041]
Machine learning models commonly exhibit unexpected failures post-deployment. We aim to standardise and bring principles to this process through answering two critical questions. We suggest that the quality of the identified failure types can be validated through measuring the intra- and inter-type generalisation. We argue that a model can be considered repaired if it achieves high accuracy on the failure types while retaining performance on the previously correct data.
arXiv Detail & Related papers (2021-09-25T12:04:19Z)
Reference-based Defect Detection Network [57.89399576743665]
The first issue is the texture shift which means a trained defect detector model will be easily affected by unseen texture. The second issue is partial visual confusion which indicates that a partial defect box is visually similar with a complete box. We propose a Reference-based Defect Detection Network (RDDN) to tackle these two problems.
arXiv Detail & Related papers (2021-08-10T05:44:23Z)
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.