Connected Superlevel Set in (Deep) Reinforcement Learning and its
Application to Minimax Theorems
- URL: http://arxiv.org/abs/2303.12981v3
- Date: Sat, 30 Sep 2023 17:58:43 GMT
- Title: Connected Superlevel Set in (Deep) Reinforcement Learning and its
Application to Minimax Theorems
- Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg
- Abstract summary: We show that the superlevel set of the objective function with respect to the policy parameter is always a connected set.
We show that the optimization objective as a function of the policy parameter and reward satisfies a stronger "equiconnectedness" property.
This is the first time such a result is established in the literature.
- Score: 15.632127097145881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this paper is to improve the understanding of the optimization
landscape for policy optimization problems in reinforcement learning.
Specifically, we show that the superlevel set of the objective function with
respect to the policy parameter is always a connected set both in the tabular
setting and under policies represented by a class of neural networks. In
addition, we show that the optimization objective as a function of the policy
parameter and reward satisfies a stronger "equiconnectedness" property. To our
best knowledge, these are novel and previously unknown discoveries.
We present an application of the connectedness of these superlevel sets to
the derivation of minimax theorems for robust reinforcement learning. We show
that any minimax optimization program which is convex on one side and is
equiconnected on the other side observes the minimax equality (i.e. has a Nash
equilibrium). We find that this exact structure is exhibited by an interesting
robust reinforcement learning problem under an adversarial reward attack, and
the validity of its minimax equality immediately follows. This is the first
time such a result is established in the literature.
Related papers
- A Novel Variational Lower Bound for Inverse Reinforcement Learning [5.370126167091961]
Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories.
We present a new Variational Lower Bound for IRL (VLB-IRL)
Our method simultaneously learns the reward function and policy under the learned reward function.
arXiv Detail & Related papers (2023-11-07T03:50:43Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Topological Guided Actor-Critic Modular Learning of Continuous Systems
with Temporal Objectives [2.398608007786179]
This work investigates the formal policy synthesis of continuous-state dynamic systems given high-level specifications in linear temporal logic.
We use neural networks to approximate the value function and policy function for hybrid product state space.
arXiv Detail & Related papers (2023-04-20T01:36:05Z) - Unsupervised Learning for Combinatorial Optimization with Principled
Objective Relaxation [19.582494782591386]
This work proposes an unsupervised learning framework for optimization (CO) problems.
Our key contribution is the observation that if the relaxed objective satisfies entry-wise concavity, a low optimization loss guarantees the quality of the final integral solutions.
In particular, this observation can guide the design of objective models in applications where the objectives are not given explicitly while requiring being modeled in prior.
arXiv Detail & Related papers (2022-07-13T06:44:17Z) - Optimistic Linear Support and Successor Features as a Basis for Optimal
Policy Transfer [7.970144204429356]
We introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set.
We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks.
arXiv Detail & Related papers (2022-06-22T19:00:08Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency [83.02999769628593]
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning.
We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions.
We present the first finite-sample result with first-order efficiency in non-tabular environments.
arXiv Detail & Related papers (2021-02-05T03:20:39Z) - The Landscape of the Proximal Point Method for Nonconvex-Nonconcave
Minimax Optimization [10.112779201155005]
Minimax PPM has become a central tool in machine learning with applications in robust, reinforcement learning, GANs, etc.
These applications are often non-concave, but existing theory is unable to identify this and the fundamental difficulties.
arXiv Detail & Related papers (2020-06-15T18:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.