Related papers: LADDER: Language Driven Slice Discovery and Error Rectification

LADDER: Language Driven Slice Discovery and Error Rectification

URL: http://arxiv.org/abs/2408.07832v9
Date: Tue, 18 Feb 2025 16:14:02 GMT
Title: LADDER: Language Driven Slice Discovery and Error Rectification
Authors: Shantanu Ghosh, Rayan Syed, Chenyu Wang, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich,
Abstract summary: Current clustering or discrete attribute-based slice discovery methods face key limitations.<n>We proposeladder to address the limitations by: (1) leveraging the flexibility of natural language to address incompleteness; and (2) employing LLM's latent textitdomain knowledge and advanced reasoning to analyze sentences and derive hypotheses directly.<n> Rigorous evaluations show thatladder consistently outperforms existing baselines in discovering and mitigating biases.
Score: 16.146099639239615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Error slice discovery is crucial to diagnose and mitigate model errors. Current clustering or discrete attribute-based slice discovery methods face key limitations: 1) clustering results in incoherent slices, while assigning discrete attributes to slices leads to incomplete coverage of error patterns due to missing or insufficient attributes; 2) these methods lack complex reasoning, preventing them from fully explaining model biases; 3) they fail to integrate \textit{domain knowledge}, limiting their usage in specialized fields \eg radiology. We propose\ladder (\underline{La}nguage-\underline{D}riven \underline{D}iscovery and \underline{E}rror \underline{R}ectification), to address the limitations by: (1) leveraging the flexibility of natural language to address incompleteness, (2) employing LLM's latent \textit{domain knowledge} and advanced reasoning to analyze sentences and derive testable hypotheses directly, identifying biased attributes, and form coherent error slices without clustering. Existing mitigation methods typically address only the worst-performing group, often amplifying errors in other subgroups. In contrast,\ladder generates pseudo attributes from the discovered hypotheses to mitigate errors across all biases without explicit attribute annotations or prior knowledge of bias. Rigorous evaluations on 6 datasets spanning natural and medical images -- comparing 200+ classifiers with diverse architectures, pretraining strategies, and LLMs -- show that\ladder consistently outperforms existing baselines in discovering and mitigating biases.

Related papers

MisSpans: Fine-Grained False Span Identification in Cross-Domain Fake News [30.038748916578978]
MisSpans is a benchmark for span-level misinformation detection and analysis.<n>It consists of paired real and fake news stories.<n>It enables fine-grained localisation, nuanced characterisation beyond true/false and actionable explanations.
arXiv Detail & Related papers (2026-01-08T11:46:30Z)
SliceLens: Fine-Grained and Grounded Error Slice Discovery for Multi-Instance Vision Tasks [16.05135819343667]
We propose SliceLens, a hypothesis-driven framework to generate and verify diverse failure hypotheses.<n>We also introduce FeSD, the first benchmark specifically designed for evaluating fine-grained error slice discovery.
arXiv Detail & Related papers (2025-12-31T03:28:41Z)
ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning [52.11294707895649]
This paper explores the problem of class-agnostic anomaly detection (AD)<n>The objective is to train one class-agnostic AD model that can generalize to detect anomalies in diverse new classes from different domains without any retraining or fine-tuning on the target data.<n> Comprehensive experiments on eight real-world AD datasets demonstrate that our ResAD++ can achieve remarkable AD results when directly used in new classes.
arXiv Detail & Related papers (2025-09-28T08:41:05Z)
Zero-Shot Anomaly Detection with Dual-Branch Prompt Learning [17.263625932911534]
Zero-shot anomaly detection (ZSAD) enables identifying and localizing defects in unseen categories.<n>Existing ZSAD methods, whether using fixed or learned prompts, struggle under domain shifts because their training data are derived from limited training domains.<n>We introduce PILOT, a framework designed to overcome these challenges through two key innovations.
arXiv Detail & Related papers (2025-08-01T17:00:12Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Addressing pitfalls in implicit unobserved confounding synthesis using explicit block hierarchical ancestral sampling [1.7037247867649157]
We show that state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models. We propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs.
arXiv Detail & Related papers (2025-03-12T09:38:40Z)
Error Slice Discovery via Manifold Compactness [47.57891946791078]
There is no proper metric of slice coherence without relying on extra information like predefined slice labels. We propose manifold compactness, a coherence metric without reliance on extra information by incorporating the data geometry property into its design. Then we develop Manifold Compactness based error Slice Discovery (MCSD), a novel algorithm that directly treats risk and coherence as the optimization objective.
arXiv Detail & Related papers (2025-01-31T11:02:07Z)
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions. We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
Label-Descriptive Patterns and their Application to Characterizing Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z)
High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z)
Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions. Subfunctions have their own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification. We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem. We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z)
Algebraic Ground Truth Inference: Non-Parametric Estimation of Sample Errors by AI Algorithms [0.0]
Non-parametric estimators of performance are an attractive solution in autonomous settings. We show that the accuracy estimators in the experiments where we have ground truth are better than one part in a hundred. The practical utility of the method is illustrated on a real-world dataset from an online advertising campaign and a sample of common classification benchmarks.
arXiv Detail & Related papers (2020-06-15T12:04:47Z)
Explaining Predictions by Approximating the Local Decision Boundary [3.60160227126201]
We present a new procedure for local decision boundary approximation (DBA) We train a variational autoencoder to learn a Euclidean latent space of encoded data representations. We exploit attribute annotations to map the latent space to attributes that are meaningful to the user.
arXiv Detail & Related papers (2020-06-14T19:12:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.