Utility Theory for Sequential Decision Making
- URL: http://arxiv.org/abs/2206.13637v1
- Date: Mon, 27 Jun 2022 21:28:35 GMT
- Title: Utility Theory for Sequential Decision Making
- Authors: Mehran Shakerinava, Siamak Ravanbakhsh
- Abstract summary: We show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return.
We demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning.
- Score: 20.7262938359876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The von Neumann-Morgenstern (VNM) utility theorem shows that under certain
axioms of rationality, decision-making is reduced to maximizing the expectation
of some utility function. We extend these axioms to increasingly structured
sequential decision making settings and identify the structure of the
corresponding utility functions. In particular, we show that memoryless
preferences lead to a utility in the form of a per transition reward and
multiplicative factor on the future return. This result motivates a
generalization of Markov Decision Processes (MDPs) with this structure on the
agent's returns, which we call Affine-Reward MDPs. A stronger constraint on
preferences is needed to recover the commonly used cumulative sum of scalar
rewards in MDPs. A yet stronger constraint simplifies the utility function for
goal-seeking agents in the form of a difference in some function of states that
we call potential functions. Our necessary and sufficient conditions demystify
the reward hypothesis that underlies the design of rational agents in
reinforcement learning by adding an axiom to the VNM rationality axioms and
motivates new directions for AI research involving sequential decision making.
Related papers
- Recursive Reward Aggregation [51.552609126905885]
We propose an alternative approach for flexible behavior alignment that eliminates the need to modify the reward function.<n>By introducing an algebraic perspective on Markov decision processes (MDPs), we show that the Bellman equations naturally emerge from the generation and aggregation of rewards.<n>Our approach applies to both deterministic and deterministic settings and seamlessly integrates with value-based and actor-critic algorithms.
arXiv Detail & Related papers (2025-07-11T12:37:20Z) - From Axioms to Algorithms: Mechanized Proofs of the vNM Utility Theorem [0.0]
We implement the classical axioms of preference-completeness, transitivity, continuity, and independence.<n>Our formalization captures the mathematical structure of preference relations over lotteries.<n>This formalization provides a rigorous foundation for applications in economic modeling, AI alignment, and management decision systems.
arXiv Detail & Related papers (2025-06-08T10:09:54Z) - Deontically Constrained Policy Improvement in Reinforcement Learning Agents [0.0]
Markov Decision Processes (MDPs) are the most common model for decision making under uncertainty in the Machine Learning community.<n>An MDP captures non-determinism, probabilistic uncertainty, and an explicit model of action.<n>A Reinforcement Learning (RL) agent learns to act in an MDP by maximizing a utility function.
arXiv Detail & Related papers (2025-06-08T01:01:06Z) - Feature Attribution from First Principles [6.836945436656676]
We argue that axiomatic frameworks that any feature attribution method should satisfy are often too restrictive.<n>Rather than imposing axioms, we start by defining attributions for the simplest possible models.<n>We derive closed-form expressions for attribution of deep ReLU networks, and take a step toward the optimization of evaluation metrics.
arXiv Detail & Related papers (2025-05-30T15:53:11Z) - Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs [18.48866194756127]
Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory.<n>We provide a full characterization of such reward functions, as well as the general d-dimensional case, in Markov Decision Processes (MDPs) under a memorylessness assumption on preferences.<n>We show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP setting -- another common multiobjective setting -- they do not.
arXiv Detail & Related papers (2025-05-17T15:23:58Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - A Semantic Approach to Decidability in Epistemic Planning (Extended
Version) [72.77805489645604]
We use a novel semantic approach to achieve decidability.
Specifically, we augment the logic of knowledge S5$_n$ and with an interaction axiom called (knowledge) commutativity.
We prove that our framework admits a finitary non-fixpoint characterization of common knowledge, which is of independent interest.
arXiv Detail & Related papers (2023-07-28T11:26:26Z) - Leveraging Task Structures for Improved Identifiability in Neural Network Representations [31.863998589693065]
We extend the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks.
We show that linear identifiability is achievable in the general multi-task regression setting.
arXiv Detail & Related papers (2023-06-26T17:16:50Z) - Decomposing Counterfactual Explanations for Consequential Decision
Making [11.17545155325116]
We develop a novel and practical recourse framework that bridges the gap between the IMF and the strong causal assumptions.
texttt generates recourses by disentangling the latent representation of co-varying features.
Our experiments on real-world data corroborate our theoretically motivated recourse model and highlight our framework's ability to provide reliable, low-cost recourse.
arXiv Detail & Related papers (2022-11-03T21:26:55Z) - Benefits of Permutation-Equivariance in Auction Mechanisms [90.42990121652956]
An auction mechanism that maximizes the auctioneer's revenue while minimizes bidders' ex-post regret is an important yet intricate problem in economics.
Remarkable progress has been achieved through learning the optimal auction mechanism by neural networks.
arXiv Detail & Related papers (2022-10-11T16:13:25Z) - Reinforcement Learning in Linear MDPs: Constant Regret and
Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z) - Alternative Microfoundations for Strategic Classification [33.67797984699066]
We show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule.
optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden.
Our model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality.
arXiv Detail & Related papers (2021-06-24T00:30:58Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z) - Decisions, Counterfactual Explanations and Strategic Behavior [16.980621769406923]
We find policies and counterfactual explanations that are optimal in terms of utility in a strategic setting.
We show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard.
We demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations.
arXiv Detail & Related papers (2020-02-11T12:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.