Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets
- URL: http://arxiv.org/abs/2510.04555v1
- Date: Mon, 06 Oct 2025 07:39:45 GMT
- Title: Tail-Safe Hedging: Explainable Risk-Sensitive Reinforcement Learning with a White-Box CBF--QP Safety Layer in Arbitrage-Free Markets
- Authors: Jian'an Zhang,
- Abstract summary: Tail-Safe is a deployability-oriented framework for derivatives hedging.<n>The learning component combines an IQN-based distributional critic with a CVaR objective.<n>The safety component enforces discrete-time CBF inequalities together with domain-specific constraints.
- Score: 4.235667373386689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Tail-Safe, a deployability-oriented framework for derivatives hedging that unifies distributional, risk-sensitive reinforcement learning with a white-box control-barrier-function (CBF) quadratic-program (QP) safety layer tailored to financial constraints. The learning component combines an IQN-based distributional critic with a CVaR objective (IQN--CVaR--PPO) and a Tail-Coverage Controller that regulates quantile sampling through temperature tilting and tail boosting to stabilize small-$\alpha$ estimation. The safety component enforces discrete-time CBF inequalities together with domain-specific constraints -- ellipsoidal no-trade bands, box and rate limits, and a sign-consistency gate -- solved as a convex QP whose telemetry (active sets, tightness, rate utilization, gate scores, slack, and solver status) forms an auditable trail for governance. We provide guarantees of robust forward invariance of the safe set under bounded model mismatch, a minimal-deviation projection interpretation of the QP, a KL-to-DRO upper bound linking per-state KL regularization to worst-case CVaR, concentration and sample-complexity results for the temperature-tilted CVaR estimator, and a CVaR trust-region improvement inequality under KL limits, together with feasibility persistence under expiry-aware tightening. Empirically, in arbitrage-free, microstructure-aware synthetic markets (SSVI $\to$ Dupire $\to$ VIX with ABIDES/MockLOB execution), Tail-Safe improves left-tail risk without degrading central performance and yields zero hard-constraint violations whenever the QP is feasible with zero slack. Telemetry is mapped to governance dashboards and incident workflows to support explainability and auditability. Limitations include reliance on synthetic data and simplified execution to isolate methodological contributions.
Related papers
- LATA: Laplacian-Assisted Transductive Adaptation for Conformal Uncertainty in Medical VLMs [61.06744611795341]
Medical vision-language models (VLMs) are strong zero-shot recognizers for medical imaging.<n>We propose texttttextbfLATA (Laplacian-Assisted Transductive Adaptation), a textittraining- and label-free refinement.<n>texttttextbfLATA sharpens zero-shot predictions without compromising exchangeability.
arXiv Detail & Related papers (2026-02-19T16:45:38Z) - Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z) - Bulk-Calibrated Credal Ambiguity Sets: Fast, Tractable Decision Making under Out-of-Sample Contamination [8.826173150779145]
Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set.<n>We show how IP credal sets translate into DRO objectives with interpretable tolerance levels.
arXiv Detail & Related papers (2026-01-29T06:37:36Z) - Safe and Compliant Cross-Market Trade Execution via Constrained RL and Zero-Knowledge Audits [0.5586191108738564]
We present a cross-market algorithmic trading system that balances execution quality with rigorous compliance enforcement.<n>The architecture comprises a high-level planner, a reinforcement learning execution agent, and an independent compliance agent.<n>We report effects at the 95% confidence level using paired t-tests and examine tail risk via CVaR.
arXiv Detail & Related papers (2025-10-06T15:52:12Z) - Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z) - Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR [3.0458514384586404]
This paper builds on Fazlyab's quadratic-constraint (QC) and semidefinite-programming (SDP) framework for neural network verification.<n>The integration broadens input-uncertainty geometry-covering ellipsoids, polytopes, and hyperplanes-and extends applicability to safety-critical domains.
arXiv Detail & Related papers (2025-09-22T07:04:53Z) - Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal [31.458406135473805]
We present UniCR, a unified framework that turns heterogeneous uncertainty evidence into a calibrated probability of correctness.<n>UniCR learns a lightweight calibration head with temperature scaling and proper scoring.<n>Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics.
arXiv Detail & Related papers (2025-09-01T13:14:58Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints [34.9739641898452]
This work introduces the ConstrainedZero policy algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy.
Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
arXiv Detail & Related papers (2024-05-01T17:17:22Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Pointwise Feasibility of Gaussian Process-based Safety-Critical Control
under Model Uncertainty [77.18483084440182]
Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively.
We present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs.
arXiv Detail & Related papers (2021-06-13T23:08:49Z) - Reinforcement Learning for Safety-Critical Control under Model
Uncertainty, using Control Lyapunov Functions and Control Barrier Functions [96.63967125746747]
Reinforcement learning framework learns the model uncertainty present in the CBF and CLF constraints.
RL-CBF-CLF-QP addresses the problem of model uncertainty in the safety constraints.
arXiv Detail & Related papers (2020-04-16T10:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.