Related papers: Active Slice Discovery in Large Language Models

Active Slice Discovery in Large Language Models

URL: http://arxiv.org/abs/2511.20713v1
Date: Mon, 24 Nov 2025 23:43:20 GMT
Title: Active Slice Discovery in Large Language Models
Authors: Minhui Zhang, Prahar Ijner, Yoav Wald, Elliot Creager,
Abstract summary: Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices.<n>We formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification.<n>We find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information.
Score: 7.451724049125496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.

Related papers

Error Slice Discovery via Manifold Compactness [47.57891946791078]
There is no proper metric of slice coherence without relying on extra information like predefined slice labels.<n>We propose manifold compactness, a coherence metric without reliance on extra information by incorporating the data geometry property into its design.<n>Then we develop Manifold Compactness based error Slice Discovery (MCSD), a novel algorithm that directly treats risk and coherence as the optimization objective.
arXiv Detail & Related papers (2025-01-31T11:02:07Z)
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing [44.370871446919594]
We propose SemSlicer, a framework that supports semantic data slicing. We show that SemSlicer generates accurate slices with low cost, reliably identifies under-performing data slices, and helps practitioners identify useful data slices that reflect systematic problems.
arXiv Detail & Related papers (2024-09-14T02:15:50Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Error Discovery by Clustering Influence Embeddings [7.27282591214364]
We present a method for identifying groups of test examples -- slices -- on which a model under-performs. We formalize coherence as a key property that any slice discovery method should satisfy. We derive a new slice discovery method, InfEmbed, which satisfies coherence by returning slices whose examples are influenced similarly by the training data.
arXiv Detail & Related papers (2023-12-07T21:42:55Z)
Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms [24.127380328812855]
New slice discovery algorithms aim to group together coherent and high-error subsets of data. We show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about an object detection model. Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step.
arXiv Detail & Related papers (2023-06-13T22:44:53Z)
Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
Unsupervised Noisy Tracklet Person Re-identification [100.85530419892333]
We present a novel selective tracklet learning (STL) approach that can train discriminative person re-id models from unlabelled tracklet data. This avoids the tedious and costly process of exhaustively labelling person image/tracklet true matching pairs across camera views. Our method is particularly more robust against arbitrary noisy data of raw tracklets therefore scalable to learning discriminative models from unconstrained tracking data.
arXiv Detail & Related papers (2021-01-16T07:31:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.