Related papers: Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning

URL: http://arxiv.org/abs/2408.13493v2
Date: Wed, 4 Sep 2024 01:00:12 GMT
Title: Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning
Authors: Alperen Tercan, Vinayak S. Prabhu,
Abstract summary: Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. We present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lexicographic multi-objective problems, which impose a lexicographic importance order over the objectives, arise in many real-life scenarios. Existing Reinforcement Learning work directly addressing lexicographic tasks has been scarce. The few proposed approaches were all noted to be heuristics without theoretical guarantees as the Bellman equation is not applicable to them. Additionally, the practical applicability of these prior approaches also suffers from various issues such as not being able to reach the goal state. While some of these issues have been known before, in this work we investigate further shortcomings, and propose fixes for improving practical performance in many cases. We also present a policy optimization approach using our Lexicographic Projection Optimization (LPO) algorithm that has the potential to address these theoretical and practical concerns. Finally, we demonstrate our proposed algorithms on benchmark problems.

Related papers

An exploration for higher efficiency in multi objective optimisation with reinforcement learning [0.0]
Efficiency in optimisation and search processes persists to be one of the challenges.<n>Utilising a pool of operators instead of a single operator to handle move operations within a neighbourhood remains promising.<n>One of the promising ideas is to generalise experiences and seek how to utilise it.
arXiv Detail & Related papers (2025-12-11T01:58:04Z)
Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z)
Policy Optimization Algorithms in a Unified Framework [7.942953533690871]
Generalized ergodicity theory sheds light on the steady-state behavior of processes. Perturbation analysis provides insights into the fundamental principles of policy optimization algorithms. We aim to make policy optimization algorithms more accessible and reduce their misuse in practice.
arXiv Detail & Related papers (2025-04-04T10:14:01Z)
Near-optimal Active Reconstruction [3.037563407333583]
We design an algorithm for the Next Best View (NBV) problem in the context of active object reconstruction. We rigorously derive sublinear bounds for the cumulative regret of our algorithm, which guarantees near-optimality. We evaluate the performance of our algorithm empirically within our simulation framework.
arXiv Detail & Related papers (2025-03-24T09:17:53Z)
Aligned Multi Objective Optimization [15.404668020811513]
In machine learning practice, there are many scenarios where such conflict does not take place. Recent findings from multi-task learning, reinforcement learning, and LLMs training show that diverse related tasks can enhance performance across objectives simultaneously. We introduce the Aligned Multi-Objective Optimization framework, propose new algorithms for this setting, and provide theoretical guarantees of their superior performance.
arXiv Detail & Related papers (2025-02-19T20:50:03Z)
Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems. We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z)
Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z)
Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems. We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z)
Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task. We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z)
A Field Guide to Federated Optimization [161.3779046812383]
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms.
arXiv Detail & Related papers (2021-07-14T18:09:08Z)
What if we Increase the Number of Objectives? Theoretical and Empirical Implications for Many-objective Optimization [0.0]
This paper investigates the influence of the number of objectives on problem characteristics and the practical behavior of commonly used procedures and algorithms for coping with many objectives. We make use of our theoretical and empirical findings to derive practical recommendations to support algorithm design.
arXiv Detail & Related papers (2021-06-06T23:25:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.