Related papers: RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots

RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots

URL: http://arxiv.org/abs/2602.01515v1
Date: Mon, 02 Feb 2026 01:04:55 GMT
Title: RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots
Authors: Humphrey Munn, Brendan Tidd, Peter Bohm, Marcus Gallagher, David Howard,
Abstract summary: We present RAPT, a lightweight, self-supervised deployment-time monitor for 50Hz humanoid control.<n>RAPT learns a probabilistic-temporal manifold of nominal execution from simulation and evaluates execution-time predictive deviation.<n>We evaluate RAPT on a Unitree G1 humanoid across four complex tasks in simulation and on physical hardware.
Score: 1.5765892172285598
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying learned control policies on humanoid robots is challenging: policies that appear robust in simulation can execute confidently in out-of-distribution (OOD) states after Sim-to-Real transfer, leading to silent failures that risk hardware damage. Although anomaly detection can mitigate these failures, prior methods are often incompatible with high-rate control, poorly calibrated at the extremely low false-positive rates required for practical deployment, or operate as black boxes that provide a binary stop signal without explaining why the robot drifted from nominal behavior. We present RAPT, a lightweight, self-supervised deployment-time monitor for 50Hz humanoid control. RAPT learns a probabilistic spatio-temporal manifold of nominal execution from simulation and evaluates execution-time predictive deviation as a calibrated, per-dimension signal. This yields (i) reliable online OOD detection under strict false-positive constraints and (ii) a continuous, interpretable measure of Sim-to-Real mismatch that can be tracked over time to quantify how far deployment has drifted from training. Beyond detection, we introduce an automated post-hoc root-cause analysis pipeline that combines gradient-based temporal saliency derived from RAPT's reconstruction objective with LLM-based reasoning conditioned on saliency and joint kinematics to produce semantic failure diagnoses in a zero-shot setting. We evaluate RAPT on a Unitree G1 humanoid across four complex tasks in simulation and on physical hardware. In large-scale simulation, RAPT improves True Positive Rate (TPR) by 37% over the strongest baseline at a fixed episode-level false positive rate of 0.5%. On real-world deployments, RAPT achieves a 12.5% TPR improvement and provides actionable interpretability, reaching 75% root-cause classification accuracy across 16 real-world failures using only proprioceptive data.

Related papers

ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation [72.78362530982109]
Current test-time scaling (TTS) techniques enhance large language model (LLM) performance by allocating additional computation at inference time.<n>We propose emphunderlineAgentic underlineRisk-Aware underlineTest-Time Scaling.<n>This framework decouples exploration from commitment by enabling test-time exploration through simulated interactions prior to real-world execution.
arXiv Detail & Related papers (2026-02-02T06:33:22Z)
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning [32.32593439144886]
Behavior-calibrated reinforcement learning allows smaller models to surpass frontier models in uncertainty quantification.<n>Our model's log-scale Accuracy-to-Hallucination Ratio gain (0.806) exceeds GPT-5's (0.207) in a challenging in-domain evaluation.
arXiv Detail & Related papers (2025-12-22T22:51:48Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z)
Machine Generalize Learning in Agent-Based Models: Going Beyond Surrogate Models for Calibration in ABMs [0.0]
Calibrating agent-based epidemic models are computationally demanding.<n>We present a supervised machine learning calibrator that learns the inverse mapping from epidemic time series to SIR parameters.
arXiv Detail & Related papers (2025-09-06T18:28:00Z)
Organ-Agents: Virtual Human Physiology Simulator via LLMs [66.40796430669158]
Organ-Agents is a multi-agent framework that simulates human physiology via LLM-driven agents.<n>We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables.<n>Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs 0.16 and robustness across SOFA-based severity strata.
arXiv Detail & Related papers (2025-08-20T01:58:45Z)
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [50.191655141020505]
Reinforcement Learning (RL) has demonstrated impressive capabilities in robotic control but remains challenging due to high sample complexity, safety concerns, and the sim-to-real gap.<n>We introduce Offline Robotic World Model (RWM-O), a model-based approach that explicitly estimates uncertainty to improve policy learning without reliance on a physics simulator.
arXiv Detail & Related papers (2025-04-23T12:58:15Z)
The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning) We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability. We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z)
Explicit Lipschitz Value Estimation Enhances Policy Robustness Against Perturbation [2.2120851074630177]
In robotic control tasks, policies trained by reinforcement learning (RL) in simulation often experience a performance drop when deployed on physical hardware. We propose that Lipschitz regularization can help condition the approximated value function gradients, leading to improved robustness after training.
arXiv Detail & Related papers (2024-04-22T05:01:29Z)
Instance-based Learning with Prototype Reduction for Real-Time Proportional Myocontrol: A Randomized User Study Demonstrating Accuracy-preserving Data Reduction for Prosthetic Embedded Systems [0.0]
This work presents the design, implementation and validation of learning techniques based on the kNN scheme for gesture detection in prosthetic control. The influence of parameterization and varying proportionality schemes is analyzed, utilizing an eight-channel-sEMG armband.
arXiv Detail & Related papers (2023-08-21T20:15:35Z)
Physics Informed Neural Networks for Phase Locked Loop Transient Stability Assessment [0.0]
Using power-electronic controllers, such as Phase Locked Loops (PLLs), to keep grid-tied renewable resources in synchronism with the grid can cause fast transient behavior during grid faults leading to instability. This paper proposes a Neural Network algorithm that accurately predicts the transient dynamics of a controller under fault with less labeled training data. The algorithm's performance is compared against a ROM and an EMT simulation in PSCAD for the CIGRE benchmark model C4.49, demonstrating its ability to accurately approximate trajectories and ROAs of a controller under varying grid impedance.
arXiv Detail & Related papers (2023-03-21T18:09:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.