Related papers: Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability

Related papers

Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say "I Don't Know" [47.930782177987446]
Large language models often struggle to recognize their knowledge limits in closed-book question answering, leading to confident hallucinations.<n>We evaluate three task-equivalent prompting regimes: Direct, Assistive, and Incremental, across different model scales and multi-hop QA benchmarks.<n>Because factual knowledge is stable while hallucinations are agreement, cross-regime provides a precise signal of internal uncertainty.
arXiv Detail & Related papers (2026-02-04T18:39:58Z)
When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents [0.0]
We reveal a critical reliability crisis: 50-69% of correct answers from small language models contain fundamentally flawed reasoning.<n>We introduce the Reasoning Integrity Score (RIS), a process-based metric validated with substantial inter-rater agreement.<n>We show RAG succeeds by grounding calculations in external evidence, reducing errors by 7.6%, while meta-cognition amplifies confusion without sufficient model capacity.
arXiv Detail & Related papers (2026-01-01T23:54:15Z)
MicroProbe: Efficient Reliability Assessment for Foundation Models with Minimal Data [0.0]
microprobe achieves comprehensive reliability assessment using only 100 strategically selected probe examples.<n>We demonstrate that microprobe achieves 23.5% higher composite reliability scores compared to random sampling baselines.<n> microprobe completes reliability assessment with 99.9% statistical power while representing a 90% reduction in assessment cost and maintaining 95% of traditional method coverage.
arXiv Detail & Related papers (2025-11-30T13:01:57Z)
Structured Uncertainty guided Clarification for LLM Agents [126.26213027785813]
LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures.<n>We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy.<n>Our SAGE-Agent leverages this structured uncertainty to achieve superior efficiency: increasing coverage on ambiguous tasks by 7-39% while reducing clarification questions by 1.5-2.7$times$ compared to strong prompting and uncertainty-based baselines.
arXiv Detail & Related papers (2025-11-11T21:50:44Z)
Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning [0.0]
We propose a scalable, privacy-preserving federated learning framework for colorectal cancer grading.<n>Our framework achieves an overall accuracy of 83.5%, outperforming a comparable centralized model.<n>The proposed modular pipeline establishes a foundational step toward deployable, privacy-aware clinical AI for digital pathology.
arXiv Detail & Related papers (2025-11-05T18:18:09Z)
CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent [46.41047559759938]
Computer-using agents (CUAs) enable task completion through natural interaction with operating systems and software interfaces.<n> Reward models offer promising alternatives, but their effectiveness on CUA evaluation remains largely underexplored.<n>We present CUARewardBench, comprising four key contributions.
arXiv Detail & Related papers (2025-10-21T12:53:40Z)
SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection [6.806105013817923]
SAVANT is a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios.<n>By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection.
arXiv Detail & Related papers (2025-10-20T19:14:29Z)
Confidence-Diversity Calibration of AI Judgement Enables Reliable Qualitative Coding [0.0]
Analysing 5,680 coding decisions from eight state-of-the-art LLMs across ten thematic categories.<n>Adding model diversity-quantified as the normalised Shannon entropy of the panel's votes-turns this single cue into a dual signal that explains agreement almost completely.
arXiv Detail & Related papers (2025-08-04T03:47:10Z)
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z)
Financial Fraud Detection Using Explainable AI and Stacking Ensemble Methods [0.6642919568083927]
We propose a fraud detection framework that combines a stacking ensemble of gradient boosting models: XGBoost, LightGBM, and CatBoost.<n>XAI techniques are used to enhance the transparency and interpretability of the model's decisions.
arXiv Detail & Related papers (2025-05-15T07:53:02Z)
TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts. Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z)
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) frequently hallucinate due to misaligned self-awareness. Existing approaches mitigate hallucinations via uncertainty estimation or query rejection. We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models [3.958317527488534]
Large Language and Vision-Language Models (LLMs/VLMs) are increasingly used in safety-critical applications. Uncertainty quantification helps assess prediction confidence and enables abstention when uncertainty is high. We propose learnable abstention, integrating reinforcement learning (RL) with Conformal Prediction (CP) to optimize abstention thresholds.
arXiv Detail & Related papers (2025-02-08T21:30:41Z)
Distilling Calibration via Conformalized Credal Inference [36.01369881486141]
One way to enhance reliability is through uncertainty quantification via Bayesian inference. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance.
arXiv Detail & Related papers (2025-01-10T15:57:23Z)
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
OATH: Efficient and Flexible Zero-Knowledge Proofs of End-to-End ML Fairness [13.986886689256128]
Zero-Knowledge Proofs of Fairness address fairness noncompliance by allowing a service provider to verify that their model serves diverse demographics equitably. We present OATH, a framework that is deployably efficient with client-facing communication and an offline audit phase. OATH provides a 1343x improvement to runtime over previous work for neural network ZKPoF, and scales up to much larger models.
arXiv Detail & Related papers (2024-09-17T16:00:35Z)
Enhanced Anomaly Detection in Automotive Systems Using SAAD: Statistical Aggregated Anomaly Detection [0.0]
This paper presents a novel anomaly detection methodology termed Statistical Aggregated Anomaly Detection (SAAD) The SAAD approach integrates advanced statistical techniques with machine learning, and its efficacy is demonstrated through validation on real sensor data from a Hardware-in-the-Loop (HIL) environment within the automotive domain.
arXiv Detail & Related papers (2024-06-11T12:41:24Z)
Accurate and Reliable Predictions with Mutual-Transport Ensemble [46.368395985214875]
We propose a co-trained auxiliary model and adaptively regularizes the cross-entropy loss using Kullback-Leibler (KL) We show that MTE can simultaneously enhance both accuracy and uncertainty calibration. For example, on the CIFAR-100 dataset, our MTE method on ResNet34/50 achieved significant improvements compared to previous state-of-the-art method.
arXiv Detail & Related papers (2024-05-30T03:15:59Z)
MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers [41.56951365163419]
"MixedNUTS" is a training-free method where the output logits of a robust classifier are processed by nonlinear transformations with only three parameters. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness.
arXiv Detail & Related papers (2024-02-03T21:12:36Z)
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
Reliable Federated Disentangling Network for Non-IID Domain Feature [62.73267904147804]
In this paper, we propose a novel reliable federated disentangling network, termed RFedDis. To the best of our knowledge, our proposed RFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling. Our proposed RFedDis provides outstanding performance with a high degree of reliability as compared to other state-of-the-art FL approaches.
arXiv Detail & Related papers (2023-01-30T11:46:34Z)
Adversarial Training with Rectified Rejection [114.83821848791206]
We propose to use true confidence (T-Con) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones.
arXiv Detail & Related papers (2021-05-31T08:24:53Z)
Adversarial Feature Stacking for Accurate and Robust Predictions [4.208059346198116]
Adversarial Feature Stacking (AFS) model can jointly take advantage of features with varied levels of robustness and accuracy. We evaluate the AFS model on CIFAR-10 and CIFAR-100 datasets with strong adaptive attack methods.
arXiv Detail & Related papers (2021-03-24T12:01:24Z)
Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.