Related papers: The Achilles Heel of AI: Fundamentals of Risk-Aware Training Data for High-Consequence Models

The Achilles Heel of AI: Fundamentals of Risk-Aware Training Data for High-Consequence Models

URL: http://arxiv.org/abs/2505.14964v1
Date: Tue, 20 May 2025 22:57:35 GMT
Title: The Achilles Heel of AI: Fundamentals of Risk-Aware Training Data for High-Consequence Models
Authors: Dave Cook, Tim Klawa,
Abstract summary: AI systems in high-consequence domains must detect rare, high-impact events while operating under tight resource constraints.<n>Traditional annotation strategies that prioritize label volume over informational value introduce redundancy and noise.<n>This paper introduces smart-sizing, a training data strategy that emphasizes label diversity, model-guided selection, and marginal utility-based stopping.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI systems in high-consequence domains such as defense, intelligence, and disaster response must detect rare, high-impact events while operating under tight resource constraints. Traditional annotation strategies that prioritize label volume over informational value introduce redundancy and noise, limiting model generalization. This paper introduces smart-sizing, a training data strategy that emphasizes label diversity, model-guided selection, and marginal utility-based stopping. We implement this through Adaptive Label Optimization (ALO), combining pre-labeling triage, annotator disagreement analysis, and iterative feedback to prioritize labels that meaningfully improve model performance. Experiments show that models trained on 20 to 40 percent of curated data can match or exceed full-data baselines, particularly in rare-class recall and edge-case generalization. We also demonstrate how latent labeling errors embedded in training and validation sets can distort evaluation, underscoring the need for embedded audit tools and performance-aware governance. Smart-sizing reframes annotation as a feedback-driven process aligned with mission outcomes, enabling more robust models with fewer labels and supporting efficient AI development pipelines for frontier models and operational systems.

Related papers

Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation [5.7977777220041204]
A common practice in pseudo-supervision is filtering pseudo-labels based on pre-defined confidence thresholds or entropy.<n>We propose Ensemble-of-Confidence Reinforcement (ENCORE), a dynamic feedback-driven thresholding strategy for pseudo-label selection.<n>Our method seamlessly integrates into existing pseudo-supervision frameworks and significantly improves segmentation performance.
arXiv Detail & Related papers (2025-05-12T15:58:08Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z)
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model. Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z)
A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams [0.0]
This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression. The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism. To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE)
arXiv Detail & Related papers (2023-12-12T19:23:54Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
TRIAGE: Characterizing and auditing training data for improved regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z)
Group Robust Classification Without Any Group Information [5.053622900542495]
This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. We propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner.
arXiv Detail & Related papers (2023-10-28T01:29:18Z)
Robust Feature Learning Against Noisy Labels [0.2082426271304908]
Mislabeled samples can significantly degrade the generalization of models. progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels. Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels.
arXiv Detail & Related papers (2023-07-10T02:55:35Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.