Related papers: The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models

The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models

URL: http://arxiv.org/abs/2512.23850v1
Date: Mon, 29 Dec 2025 20:29:09 GMT
Title: The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models
Authors: Rahul Baxi,
Abstract summary: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress.<n>We introduce the Drill-Down Fabricate Test (DDFT), a protocol that measures robustness.<n>We find flagship models exhibit brittleness despite their scale, while smaller models can achieve robust performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot distinguish a model that lacks knowledge from one whose verification mechanisms collapse when information degrades or adversaries probe for weaknesses. We introduce the Drill-Down and Fabricate Test (DDFT), a protocol that measures epistemic robustness: a model's ability to maintain factual accuracy under progressive semantic compression and adversarial fabrication. We propose a two-system cognitive model comprising a Semantic System that generates fluent text and an Epistemic Verifier that validates factual accuracy. Our findings, based on evaluating 9 frontier models across 8 knowledge domains at 5 compression levels (1,800 turn-level evaluations), reveal that epistemic robustness is orthogonal to conventional design paradigms. Neither parameter count (r=0.083, p=0.832) nor architectural type (r=0.153, p=0.695) significantly predicts robustness, suggesting it emerges from training methodology and verification mechanisms distinct from current approaches. Error detection capability strongly predicts overall robustness (rho=-0.817, p=0.007), indicating this is the critical bottleneck. We find that flagship models exhibit brittleness despite their scale, while smaller models can achieve robust performance, challenging assumptions about the relationship between model size and reliability. The DDFT framework provides both theoretical foundation and practical tools for assessing epistemic robustness before deployment in critical applications.

Related papers

Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models [15.849480549367684]
We propose DAGVul, a novel framework that models vulnerability reasoning as a Directed Acyclic Graph (DAG) generation task.<n>By further introducing Reinforcement Learning with Verifiable Rewards (RLVR), we align model reasoning trace with program-intrinsic logic.<n>Our framework improves the reasoning F1-score by an average of 18.9% over all the baselines.
arXiv Detail & Related papers (2026-02-06T13:19:45Z)
Catch Me If You Can: How Smaller Reasoning Models Pretend to Reason with Mathematical Fidelity [15.774418410083515]
We introduce a diagnostic framework that distinguishes genuine mathematical reasoning from superficial pattern matching.<n>We reveal a striking disconnect between surface performance and reasoning fidelity.<n>Our diagnostics expose reasoning failures invisible to traditional accuracy metrics.
arXiv Detail & Related papers (2025-11-29T16:47:01Z)
Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric [49.393713730706445]
We introduce Bench-C, a benchmark emphasizing discriminative samples for assessing corruption robustness.<n>We propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure.
arXiv Detail & Related papers (2025-11-24T12:07:56Z)
RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
Statistical Runtime Verification for LLMs via Robustness Estimation [0.0]
Adversarial robustness verification is essential for ensuring the safe deployment of Large Language Models (LLMs) in runtime-critical applications.<n>This paper presents a case study adapting and extending the RoMA statistical verification framework to assess its feasibility as an online runtime robustness monitor for LLMs in black-box deployment settings.
arXiv Detail & Related papers (2025-04-24T16:36:19Z)
TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.<n>Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z)
Is it the model or the metric -- On robustness measures of deeplearning models [2.8169948004297565]
We revisit robustness investigating the sufficiency of robust accuracy (RA) within the context of deepfake detection.<n>We present a comparison of RA and RR and demonstrate that despite similar RA between models, the models show varying RR under different tolerance (perturbation) levels.
arXiv Detail & Related papers (2024-12-13T02:26:58Z)
Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree. We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions. By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction [11.878012909876713]
We propose a joint classification and rationale extraction model named AT-BMC. It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information. Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin.
arXiv Detail & Related papers (2021-12-20T09:48:32Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.