MultiRisk: Multiple Risk Control via Iterative Score Thresholding
- URL: http://arxiv.org/abs/2512.24587v1
- Date: Wed, 31 Dec 2025 03:25:30 GMT
- Title: MultiRisk: Multiple Risk Control via Iterative Score Thresholding
- Authors: Sunay Joshi, Yan Sun, Hamed Hassani, Edgar Dobriban,
- Abstract summary: We formalize the problem of enforcing multiple risk constraints with user-defined priorities.<n>We introduce two efficient dynamic programming algorithms that leverage this sequential structure.<n>We show that our algorithm can control each individual risk at close to the target level.
- Score: 40.193623095603265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a lightweight mechanism for behavior control that compares performance scores to estimated thresholds, and modifies outputs when these bounds are violated. We formalize the problem of enforcing multiple risk constraints with user-defined priorities, and introduce two efficient dynamic programming algorithms that leverage this sequential structure. The first, MULTIRISK-BASE, provides a direct finite-sample procedure for selecting thresholds, while the second, MULTIRISK, leverages data exchangeability to guarantee simultaneous control of the risks. Under mild assumptions, we show that MULTIRISK achieves nearly tight control of all constraint risks. The analysis requires an intricate iterative argument, upper bounding the risks by introducing several forms of intermediate symmetrized risk functions, and carefully lower bounding the risks by recursively counting jumps in symmetrized risk functions between appropriate risk levels. We evaluate our framework on a three-constraint Large Language Model alignment task using the PKU-SafeRLHF dataset, where the goal is to maximize helpfulness subject to multiple safety constraints, and where scores are generated by a Large Language Model judge and a perplexity filter. Our experimental results show that our algorithm can control each individual risk at close to the target level.
Related papers
- Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z) - Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment [49.2305683068875]
We propose Risk-aware Stepwise Alignment (RSA), a novel alignment method that incorporates risk awareness into the policy optimization process.<n> RSA mitigates risks induced by excessive model shift away from a reference policy, and it explicitly suppresses low-probability yet high-impact harmful behaviors.<n> Experimental results demonstrate that our method achieves high levels of helpfulness while ensuring strong safety.
arXiv Detail & Related papers (2025-12-30T14:38:02Z) - RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration [81.38705556267917]
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations.<n>We introduce a theoretical framework that reconstructs the underlying risk concept space.<n>We propose RADAR, a multi-agent collaborative evaluation framework.
arXiv Detail & Related papers (2025-09-28T09:35:32Z) - Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression [2.592761128203891]
Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go.<n>Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions.<n>We propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk to enforce safety without complex architectures.
arXiv Detail & Related papers (2025-06-08T00:22:00Z) - Conditional Conformal Risk Adaptation [9.559062601251464]
We develop a new score function for creating adaptive prediction sets that significantly improve conditional risk control for segmentation tasks.<n>We introduce a specialized probability calibration framework that enhances the reliability of pixel-wise inclusion estimates.<n>Our experiments on polyp segmentation demonstrate that all three methods provide valid marginal risk control and deliver more consistent conditional risk control.
arXiv Detail & Related papers (2025-04-10T10:01:06Z) - Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models [46.56041622514975]
We introduce TRON, a two-step framework for risk control and assessment.<n>TRON achieves desired error rates bounded by two user-specified risk levels.<n>Deduplicated prediction sets maintain adaptiveness while being more efficient and stable for risk assessment under different risk levels.
arXiv Detail & Related papers (2024-10-10T17:50:42Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Sample-Based Bounds for Coherent Risk Measures: Applications to Policy
Synthesis and Verification [32.9142708692264]
This paper aims to address a few problems regarding risk-aware verification and policy synthesis.
First, we develop a sample-based method to evaluate a subset of a random variable distribution.
Second, we develop a robotic-based method to determine solutions to problems that outperform a large fraction of the decision space.
arXiv Detail & Related papers (2022-04-21T01:06:10Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.