Consistent Aggregation of Objectives with Diverse Time Preferences
Requires Non-Markovian Rewards
- URL: http://arxiv.org/abs/2310.00435v1
- Date: Sat, 30 Sep 2023 17:06:34 GMT
- Title: Consistent Aggregation of Objectives with Diverse Time Preferences
Requires Non-Markovian Rewards
- Authors: Silviu Pitis
- Abstract summary: It is shown that Markovian aggregation of reward functions is not possible when the time preference for each objective may vary.
It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives.
This work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.
- Score: 7.9456318392035845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the capabilities of artificial agents improve, they are being increasingly
deployed to service multiple diverse objectives and stakeholders. However, the
composition of these objectives is often performed ad hoc, with no clear
justification. This paper takes a normative approach to multi-objective agency:
from a set of intuitively appealing axioms, it is shown that Markovian
aggregation of Markovian reward functions is not possible when the time
preference (discount factor) for each objective may vary. It follows that
optimal multi-objective agents must admit rewards that are non-Markovian with
respect to the individual objectives. To this end, a practical non-Markovian
aggregation scheme is proposed, which overcomes the impossibility with only one
additional parameter for each objective. This work offers new insights into
sequential, multi-objective agency and intertemporal choice, and has practical
implications for the design of AI systems deployed to serve multiple
generations of principals with varying time preference.
Related papers
- Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts [38.95012734839997]
Multi-objective alignment aims at balancing and controlling the different alignment objectives of large language models.
We propose MCA (Multi-objective Contrastive Alignemnt), which constructs an expert prompt and an adversarial prompt for each objective to contrast.
arXiv Detail & Related papers (2024-08-09T14:36:42Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Multi-Target Multiplicity: Flexibility and Fairness in Target
Specification under Resource Constraints [76.84999501420938]
We introduce a conceptual and computational framework for assessing how the choice of target affects individuals' outcomes.
We show that the level of multiplicity that stems from target variable choice can be greater than that stemming from nearly-optimal models of a single target.
arXiv Detail & Related papers (2023-06-23T18:57:14Z) - Alleviating Search Bias in Bayesian Evolutionary Optimization with Many
Heterogeneous Objectives [9.139734850798124]
We deal with multi-objective optimization problems with heterogeneous objectives (HE-MOPs)
A new acquisition function that mitigates search bias towards the fast objectives is suggested.
We demonstrate the effectiveness of the proposed algorithm by testing it on widely used multi-/many-objective benchmark problems.
arXiv Detail & Related papers (2022-08-25T17:07:40Z) - Inferring Lexicographically-Ordered Rewards from Preferences [82.42854687952115]
This paper proposes a method for inferring multi-objective reward-based representations of an agent's observed preferences.
We model the agent's priorities over different objectives as entering lexicographically, so that objectives with lower priorities matter only when the agent is indifferent with respect to objectives with higher priorities.
arXiv Detail & Related papers (2022-02-21T12:01:41Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - From STL Rulebooks to Rewards [4.859570041295978]
We propose a principled approach to shaping rewards for reinforcement learning from multiple objectives.
We first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements.
We then develop a method for systematically combining evaluations of multiple requirements into a single reward.
arXiv Detail & Related papers (2021-10-06T14:16:59Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way.
We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.