Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making
- URL: http://arxiv.org/abs/2511.12378v1
- Date: Sat, 15 Nov 2025 22:50:20 GMT
- Title: Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making
- Authors: Dylan M. Asmar, Mykel J. Kochenderfer,
- Abstract summary: We introduce a framework that learns and adapts to varying suggester reliability in partially observable environments.<n>First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions.<n>Second, we introduce an explicit ask'' action allowing agents to strategically request suggestions at critical moments.
- Score: 28.742690356257157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust performance across varying suggester qualities, adaptation to changing reliability, and strategic management of suggestion requests. This work provides a foundation for adaptive human-agent collaboration by addressing suggestion uncertainty in uncertain environments.
Related papers
- Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence [0.4199844472131922]
We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
arXiv Detail & Related papers (2025-06-27T14:21:03Z) - TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.<n>Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z) - Uncertainty in Action: Confidence Elicitation in Embodied Agents [7.180871428121812]
We present the first work investigating embodied confidence elicitation in open-ended multimodal environments.<n>We introduce Elicitation Policies, which structure confidence assessment across inductive, deductive, and abductive reasoning.<n>We show that structured reasoning approaches, such as Chain-of-Thoughts, improve confidence calibration.
arXiv Detail & Related papers (2025-03-13T17:59:41Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.<n>We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.<n>We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Safe Explicable Planning [3.3869539907606603]
We propose Safe Explicable Planning (SEP) to support the specification of a safety bound.
Our approach generalizes the consideration of multiple objectives stemming from multiple models.
We provide formal proofs that validate the desired theoretical properties of these methods.
arXiv Detail & Related papers (2023-04-04T21:49:02Z) - Debiasing Recommendation by Learning Identifiable Latent Confounders [49.16119112336605]
Confounding bias arises due to the presence of unmeasured variables that can affect both a user's exposure and feedback.
Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure.
We propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables to resolve the aforementioned non-identification issue.
arXiv Detail & Related papers (2023-02-10T05:10:26Z) - Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework [41.04606578479283]
We introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC.
At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector.
Experimental results on both stationary and non-stationary environments demonstrate that the proposed framework significantly improves the learning efficiency of the agent.
arXiv Detail & Related papers (2022-07-05T10:58:11Z) - Deceptive Decision-Making Under Uncertainty [25.197098169762356]
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks.
By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals.
We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies.
arXiv Detail & Related papers (2021-09-14T14:56:23Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - A two-level solution to fight against dishonest opinions in
recommendation-based trust systems [13.356755375091456]
We consider a scenario in which an agent requests recommendations from multiple parties to build trust toward another agent.
At the collection level, we propose to allow agents to self-assess the accuracy of their recommendations.
At the processing level, we propose a recommendations aggregation technique that is resilient to collusion attacks.
arXiv Detail & Related papers (2020-06-09T00:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.