Related papers: Utility Theory for Sequential Decision Making

Utility Theory for Sequential Decision Making

URL: http://arxiv.org/abs/2206.13637v1
Date: Mon, 27 Jun 2022 21:28:35 GMT
Title: Utility Theory for Sequential Decision Making
Authors: Mehran Shakerinava, Siamak Ravanbakhsh
Abstract summary: We show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. We demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning.
Score: 20.7262938359876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing the expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. In particular, we show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. This result motivates a generalization of Markov Decision Processes (MDPs) with this structure on the agent's returns, which we call Affine-Reward MDPs. A stronger constraint on preferences is needed to recover the commonly used cumulative sum of scalar rewards in MDPs. A yet stronger constraint simplifies the utility function for goal-seeking agents in the form of a difference in some function of states that we call potential functions. Our necessary and sufficient conditions demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning by adding an axiom to the VNM rationality axioms and motivates new directions for AI research involving sequential decision making.

Related papers

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning. We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z)
Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL). We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z)
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework. It automatically generalizes the task given into smaller, more manageable subtasks. It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z)
A Semantic Approach to Decidability in Epistemic Planning (Extended Version) [72.77805489645604]
We use a novel semantic approach to achieve decidability. Specifically, we augment the logic of knowledge S5$_n$ and with an interaction axiom called (knowledge) commutativity. We prove that our framework admits a finitary non-fixpoint characterization of common knowledge, which is of independent interest.
arXiv Detail & Related papers (2023-07-28T11:26:26Z)
Leveraging Task Structures for Improved Identifiability in Neural Network Representations [31.863998589693065]
We extend the theory of identifiability in supervised learning by considering the consequences of having access to a distribution of tasks. We show that linear identifiability is achievable in the general multi-task regression setting.
arXiv Detail & Related papers (2023-06-26T17:16:50Z)
Decomposing Counterfactual Explanations for Consequential Decision Making [11.17545155325116]
We develop a novel and practical recourse framework that bridges the gap between the IMF and the strong causal assumptions. texttt generates recourses by disentangling the latent representation of co-varying features. Our experiments on real-world data corroborate our theoretically motivated recourse model and highlight our framework's ability to provide reliable, low-cost recourse.
arXiv Detail & Related papers (2022-11-03T21:26:55Z)
Benefits of Permutation-Equivariance in Auction Mechanisms [90.42990121652956]
An auction mechanism that maximizes the auctioneer's revenue while minimizes bidders' ex-post regret is an important yet intricate problem in economics. Remarkable progress has been achieved through learning the optimal auction mechanism by neural networks.
arXiv Detail & Related papers (2022-10-11T16:13:25Z)
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z)
Alternative Microfoundations for Strategic Classification [33.67797984699066]
We show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule. optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden. Our model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality.
arXiv Detail & Related papers (2021-06-24T00:30:58Z)
Maximizing Information Gain in Partially Observable Environments via Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent. We derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
Decisions, Counterfactual Explanations and Strategic Behavior [16.980621769406923]
We find policies and counterfactual explanations that are optimal in terms of utility in a strategic setting. We show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard. We demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations.
arXiv Detail & Related papers (2020-02-11T12:04:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.