Related papers: Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning

Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning

URL: http://arxiv.org/abs/2508.10608v1
Date: Thu, 14 Aug 2025 12:52:57 GMT
Title: Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning
Authors: Davide Guidobene, Lorenzo Benedetti, Diego Arapovic,
Abstract summary: Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL)<n>We consider the problem of MORL where the objectives are combined using a non-linear scalarization function.<n>Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL) that aims to optimize multiple, often conflicting objectives simultaneously rather than focusing on a single reward. This approach is crucial in complex decision-making scenarios where agents must balance trade-offs between various goals, such as maximizing performance while minimizing costs. We consider the problem of MORL where the objectives are combined using a non-linear scalarization function. Just like in standard RL, policy gradient methods (PGMs) are amongst the most effective for handling large and continuous state-action spaces in MORL. However, existing PGMs for MORL suffer from high sample inefficiency, requiring large amounts of data to be effective. Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces. In this work, we address the issue of sample efficiency by implementing variance-reduction techniques to reduce the sample complexity of policy gradients while maintaining general assumptions.

Related papers

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models [67.45032003041399]
We propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs.<n>MPCO adaptively balances the importance of different paradigm representations and guides the global optimisation.<n>Our solution consistently outperforms state-of-the-art methods in both targeted and untargeted attacks on open-source and closed-source MLLMs.
arXiv Detail & Related papers (2026-03-05T06:01:26Z)
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs [126.45104018441698]
Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs)<n>We argue that this failure stems from regularizing local token behavior rather than diversity over sets of solutions.<n>We propose Uniqueness-Aware Reinforcement Learning, a rollout-level objective that explicitly rewards correct solutions that exhibit rare high-level strategies.
arXiv Detail & Related papers (2026-01-13T17:48:43Z)
Pareto Multi-Objective Alignment for Language Models [7.9051473654430655]
Large language models (LLMs) are increasingly deployed in real-world applications that require careful balancing of multiple, often conflicting, objectives.<n>We propose a principled and computationally efficient algorithm designed explicitly for multi-objective alignment (MOA) in LLMs.<n>PAMA transforms multi-objective RLHF into a convex optimization with a closed-form solution, significantly enhancing scalability.
arXiv Detail & Related papers (2025-08-11T08:54:14Z)
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality [52.49062565901046]
Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models with human values.<n>Existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences.<n>We introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations.
arXiv Detail & Related papers (2025-03-10T09:52:42Z)
Pareto Set Learning for Multi-Objective Reinforcement Learning [19.720934024901542]
We propose a decomposition-based framework for Multi-Objective RL (MORL)<n>PSL-MORL harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight.<n>We show that PSL-MORL significantly outperforms state-of-the-art MORL methods in the hypervolume and sparsity indicators.
arXiv Detail & Related papers (2025-01-12T10:43:05Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [51.00436121587591]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.<n>We focus on the case of linear utility functions parametrised by weight vectors w.<n>We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models [70.45441031021291]
Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities. LVLMs are often problematic due to their massive computational/energy costs and carbon consumption. We propose Efficient Coarse-to-Fine LayerWise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs.
arXiv Detail & Related papers (2023-10-04T17:34:00Z)
Beyond Losses Reweighting: Empowering Multi-Task Learning via the Generalization Perspective [61.10883077161432]
Multi-task learning (MTL) trains deep neural networks to optimize several objectives simultaneously using a shared backbone.<n>We introduce a novel MTL framework that leverages weight perturbation to regulate gradient norms, thus improving generalization.<n>Our method significantly outperforms existing gradient-based MTL techniques in terms of task performance and overall model robustness.
arXiv Detail & Related papers (2022-11-24T17:19:30Z)
PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z)
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z)
gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach [2.0305676256390934]
Generalized Thresholded Lexicographic Ordering (gTLO) is a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
arXiv Detail & Related papers (2022-04-11T10:06:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.