Related papers: MicroProbe: Efficient Reliability Assessment for Foundation Models with Minimal Data

MicroProbe: Efficient Reliability Assessment for Foundation Models with Minimal Data

URL: http://arxiv.org/abs/2512.20630v1
Date: Sun, 30 Nov 2025 13:01:57 GMT
Title: MicroProbe: Efficient Reliability Assessment for Foundation Models with Minimal Data
Authors: Aayam Bansal, Ishaan Gangwani,
Abstract summary: microprobe achieves comprehensive reliability assessment using only 100 strategically selected probe examples.<n>We demonstrate that microprobe achieves 23.5% higher composite reliability scores compared to random sampling baselines.<n> microprobe completes reliability assessment with 99.9% statistical power while representing a 90% reduction in assessment cost and maintaining 95% of traditional method coverage.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation model reliability assessment typically requires thousands of evaluation examples, making it computationally expensive and time-consuming for real-world deployment. We introduce microprobe, a novel approach that achieves comprehensive reliability assessment using only 100 strategically selected probe examples. Our method combines strategic prompt diversity across five key reliability dimensions with advanced uncertainty quantification and adaptive weighting to efficiently detect potential failure modes. Through extensive empirical evaluation on multiple language models (GPT-2 variants, GPT-2 Medium, GPT-2 Large) and cross-domain validation (healthcare, finance, legal), we demonstrate that microprobe achieves 23.5% higher composite reliability scores compared to random sampling baselines, with exceptional statistical significance (p < 0.001, Cohen's d = 1.21). Expert validation by three AI safety researchers confirms the effectiveness of our strategic selection, rating our approach 4.14/5.0 versus 3.14/5.0 for random selection. microprobe completes reliability assessment with 99.9% statistical power while representing a 90% reduction in assessment cost and maintaining 95% of traditional method coverage. Our approach addresses a critical gap in efficient model evaluation for responsible AI deployment.

Related papers

The GT-Score: A Robust Objective Function for Reducing Overfitting in Data-Driven Trading Strategies [51.56484100374058]
GT-Score is a composite objective function that integrates performance, statistical significance, consistency, and downside risk.<n>In walk-forward validation, GT-Score improves the generalization ratio by 98% relative to baseline objective functions.<n>These results suggest that embedding an anti-overfitting structure into the objective can improve the reliability of backtests in quantitative research.
arXiv Detail & Related papers (2026-01-22T05:16:47Z)
Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage [0.0]
This paper investigates whether large language models (LLMs) can provide reliable and consistent assessments at the development stage.<n>We generated over 850 evaluations in three independent evaluations per site using a pipeline of OpenAI's GPT-4o.<n>For issue detection, the model demonstrated moderate consistency, with an average pairwise Cohen's Kappa of 0.50 and an exact agreement of 84%.
arXiv Detail & Related papers (2025-12-03T21:02:54Z)
Formal Models and Convergence Analysis for Context-Aware Security Verification [0.0]
We present a formal framework for context-aware security verification that establishes provable guarantees for ML-enhanced adaptive systems.<n>We introduce context-completeness - a new security property - and prove: (1) sample complexity bounds showing when adaptive verification succeeds, (2) information-theoretic limits relating context richness to detection capability, and (3) convergence guarantees for ML-based payload generators.
arXiv Detail & Related papers (2025-10-14T12:21:36Z)
Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications [0.7124971549479361]
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification.<n>We determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability.
arXiv Detail & Related papers (2025-05-20T21:12:58Z)
Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks [26.422616504640786]
We propose a novel individual attack method, Probability Margin Attack (PMA), which defines the adversarial margin in the probability space rather than the logits space.<n>We create a million-scale dataset, CC1M, and use it to conduct the first million-scale adversarial robustness evaluation of adversarially-trained ImageNet models.
arXiv Detail & Related papers (2024-11-20T10:41:23Z)
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability [0.0]
Large Language Models (LLMs) have shown significant advances in text generation but often lack the reliability needed for autonomous deployment. We introduce a novel framework that repurposes ensemble methods for content validation through model consensus. In tests across 78 complex cases requiring factual accuracy and causal consistency, our framework improved precision from 73.1% to 93.9%.
arXiv Detail & Related papers (2024-11-10T17:32:16Z)
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation [70.27452774899189]
Large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. As of November 2023, state-of-the-art LLMs do not provide access to these probabilities. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets.
arXiv Detail & Related papers (2023-11-15T11:27:44Z)
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.