When to Act: Calibrated Confidence for Reliable Human Intention Prediction in Assistive Robotics
- URL: http://arxiv.org/abs/2601.04982v1
- Date: Thu, 08 Jan 2026 14:35:17 GMT
- Title: When to Act: Calibrated Confidence for Reliable Human Intention Prediction in Assistive Robotics
- Authors: Johannes A. Gaus, Winfried Ilg, Daniel Haeufle,
- Abstract summary: We introduce a safety-critical triggering framework based on calibrated probabilities for multimodal next-action prediction in Activities of Daily Living.<n>Post-hoc calibration aligns predicted confidence with empirical reliability and reduces miscalibration by about an order of magnitude without affecting accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assistive devices must determine both what a user intends to do and how reliable that prediction is before providing support. We introduce a safety-critical triggering framework based on calibrated probabilities for multimodal next-action prediction in Activities of Daily Living. Raw model confidence often fails to reflect true correctness, posing a safety risk. Post-hoc calibration aligns predicted confidence with empirical reliability and reduces miscalibration by about an order of magnitude without affecting accuracy. The calibrated confidence drives a simple ACT/HOLD rule that acts only when reliability is high and withholds assistance otherwise. This turns the confidence threshold into a quantitative safety parameter for assisted actions and enables verifiable behavior in an assistive control loop.
Related papers
- BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents [58.05949210993854]
We investigate whether search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions.<n>We propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level.
arXiv Detail & Related papers (2025-10-27T15:58:51Z) - Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning [1.2183405753834562]
This thesis investigates how uncertainty estimation can enhance the safety and trustworthiness of machine learning (ML) systems.<n>We first show that a model's training trajectory contains rich uncertainty signals that can be exploited without altering its architecture or loss.<n>We propose a lightweight, post-hoc abstention method that works across tasks, avoids the cost of deep ensembles, and achieves state-of-the-art selective prediction performance.
arXiv Detail & Related papers (2025-08-11T02:33:53Z) - Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision [12.287123198288079]
Uncertainty calibration is essential for the safe deployment of large language models (LLMs)<n>We find that supervised fine-tuning with scalar confidence labels alone suffices to elicit self-verification behavior of language models.<n>We propose a simple rethinking method that boosts performance via test-time scaling based on calibrated uncertainty.
arXiv Detail & Related papers (2025-06-04T08:56:24Z) - Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention [65.47632669243657]
A dishonest institution can exploit mechanisms to discriminate or unjustly deny services under the guise of uncertainty.<n>We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage.<n>We propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence.
arXiv Detail & Related papers (2025-05-29T19:47:50Z) - Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z) - Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.<n>Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.<n>We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning [26.369237406972577]
Vision-centric semantic occupancy prediction plays a crucial role in autonomous driving.
There is still few research effort to explore the reliability in predicting semantic occupancy from camera.
We propose ReliOcc, a method designed to enhance the reliability of camera-based occupancy networks.
arXiv Detail & Related papers (2024-09-26T16:33:16Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Two Sides of Miscalibration: Identifying Over and Under-Confidence
Prediction for Network Calibration [1.192436948211501]
Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks.
Miscalibration can lead to model over-confidence and/or under-confidence.
We introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status.
We use the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence.
arXiv Detail & Related papers (2023-08-06T17:59:14Z) - Did You Mean...? Confidence-based Trade-offs in Semantic Parsing [52.28988386710333]
We show how a calibrated model can help balance common trade-offs in task-oriented parsing.
We then examine how confidence scores can help optimize the trade-off between usability and safety.
arXiv Detail & Related papers (2023-03-29T17:07:26Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.