Related papers: Scalable Multi-Objective Reinforcement Learning with Fairness Guarantees using Lorenz Dominance

Related papers

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback [14.81819959351561]
We show that multi-objective RLAIF can produce policies that yield balanced trade-offs reflecting different user priorities without laborious reward engineering.<n>We argue that integrating RLAIF into multi-objective RL offers a scalable path toward user-aligned policy learning.
arXiv Detail & Related papers (2026-02-24T09:47:25Z)
SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning [50.93295951454092]
We introduce a set level diversity objective defined over sampled trajectories using kernelized similarity.<n>Our approach derives a leave-one-out marginal contribution for each sampled trajectory and integrates this objective as a plug-in advantage shaping term for policy optimization.<n>Experiments across a range of model scales demonstrate the effectiveness of our proposed algorithm, consistently outperforming strong baselines in both Pass@1 and Pass@K across various benchmarks.
arXiv Detail & Related papers (2026-02-01T07:13:20Z)
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs [126.45104018441698]
Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs)<n>We argue that this failure stems from regularizing local token behavior rather than diversity over sets of solutions.<n>We propose Uniqueness-Aware Reinforcement Learning, a rollout-level objective that explicitly rewards correct solutions that exhibit rare high-level strategies.
arXiv Detail & Related papers (2026-01-13T17:48:43Z)
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning [0.0]
Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL)<n>We consider the problem of MORL where the objectives are combined using a non-linear scalarization function.<n>Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces.
arXiv Detail & Related papers (2025-08-14T12:52:57Z)
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z)
Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces [16.400288624027375]
In many real-world settings, it is important to optimize over multiple objectives simultaneously. We consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large.
arXiv Detail & Related papers (2025-02-17T14:25:33Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning [10.848218400641466]
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. We propose an approach for clustering the solution set generated by MORL.
arXiv Detail & Related papers (2024-11-07T15:26:38Z)
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Fairness in Preference-based Reinforcement Learning [2.3388338598125196]
We design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences. Experiment studies show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
arXiv Detail & Related papers (2023-06-16T17:47:36Z)
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning [30.605881670761853]
We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs. We show that such an approach achieves sub-linear regret in terms of the number of episodes.
arXiv Detail & Related papers (2023-06-01T03:43:53Z)
Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z)
Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning. The improvement comes from mitigating unobserved confounders that cause the targets, but not the input. Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z)
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms. Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z)
Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting. SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations. Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
A Distributional View on Multi-Objective Policy Optimization [24.690800846837273]
We propose an algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
arXiv Detail & Related papers (2020-05-15T13:02:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.