Related papers: Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives

URL: http://arxiv.org/abs/2304.10041v1
Date: Thu, 20 Apr 2023 01:36:05 GMT
Title: Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives
Authors: Lening Li, Zhentian Qian
Abstract summary: This work investigates the formal policy synthesis of continuous-state dynamic systems given high-level specifications in linear temporal logic. We use neural networks to approximate the value function and policy function for hybrid product state space.
Score: 2.398608007786179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work investigates the formal policy synthesis of continuous-state stochastic dynamic systems given high-level specifications in linear temporal logic. To learn an optimal policy that maximizes the satisfaction probability, we take a product between a dynamic system and the translated automaton to construct a product system on which we solve an optimal planning problem. Since this product system has a hybrid product state space that results in reward sparsity, we introduce a generalized optimal backup order, in reverse to the topological order, to guide the value backups and accelerate the learning process. We provide the optimality proof for using the generalized optimal backup order in this optimal planning problem. Further, this paper presents an actor-critic reinforcement learning algorithm when topological order applies. This algorithm leverages advanced mathematical techniques and enjoys the property of hyperparameter self-tuning. We provide proof of the optimality and convergence of our proposed reinforcement learning algorithm. We use neural networks to approximate the value function and policy function for hybrid product state space. Furthermore, we observe that assigning integer numbers to automaton states can rank the value or policy function approximated by neural networks. To break the ordinal relationship, we use an individual neural network for each automaton state's value (policy) function, termed modular learning. We conduct two experiments. First, to show the efficacy of our reinforcement learning algorithm, we compare it with baselines on a classic control task, CartPole. Second, we demonstrate the empirical performance of our formal policy synthesis framework on motion planning of a Dubins car with a temporal specification.

Related papers

Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures. This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z)
Time-optimal neural feedback control of nilpotent systems as a binary classification problem [0.6999740786886538]
computational method for synthesis of time-optimal feedback control laws. System is sampled and solved to generate a dataset for the construction of a time-optimal deep neural network. Tests assess the accuracy, robustness, and real-time-control capabilities of the approximate control law.
arXiv Detail & Related papers (2025-03-21T23:36:20Z)
Optimization of a Hydrodynamic Computational Reservoir through Evolution [58.720142291102135]
We interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir. We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm. Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters.
arXiv Detail & Related papers (2023-04-20T19:15:02Z)
Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas. Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z)
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. We devise algorithms learning optimal tilt control policies from existing data. We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem. We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization. Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z)
Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning [0.0]
We show how reinforcement learning can be applied for computing policies that are finite-memory and deterministic. We develop the required assumptions and theories for the convergence of the learned policy to the optimal policy. We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving problem.
arXiv Detail & Related papers (2020-05-04T08:36:25Z)
Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions. We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.