Topological Guided Actor-Critic Modular Learning of Continuous Systems
with Temporal Objectives
- URL: http://arxiv.org/abs/2304.10041v1
- Date: Thu, 20 Apr 2023 01:36:05 GMT
- Title: Topological Guided Actor-Critic Modular Learning of Continuous Systems
with Temporal Objectives
- Authors: Lening Li, Zhentian Qian
- Abstract summary: This work investigates the formal policy synthesis of continuous-state dynamic systems given high-level specifications in linear temporal logic.
We use neural networks to approximate the value function and policy function for hybrid product state space.
- Score: 2.398608007786179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work investigates the formal policy synthesis of continuous-state
stochastic dynamic systems given high-level specifications in linear temporal
logic. To learn an optimal policy that maximizes the satisfaction probability,
we take a product between a dynamic system and the translated automaton to
construct a product system on which we solve an optimal planning problem. Since
this product system has a hybrid product state space that results in reward
sparsity, we introduce a generalized optimal backup order, in reverse to the
topological order, to guide the value backups and accelerate the learning
process. We provide the optimality proof for using the generalized optimal
backup order in this optimal planning problem. Further, this paper presents an
actor-critic reinforcement learning algorithm when topological order applies.
This algorithm leverages advanced mathematical techniques and enjoys the
property of hyperparameter self-tuning. We provide proof of the optimality and
convergence of our proposed reinforcement learning algorithm. We use neural
networks to approximate the value function and policy function for hybrid
product state space. Furthermore, we observe that assigning integer numbers to
automaton states can rank the value or policy function approximated by neural
networks. To break the ordinal relationship, we use an individual neural
network for each automaton state's value (policy) function, termed modular
learning. We conduct two experiments. First, to show the efficacy of our
reinforcement learning algorithm, we compare it with baselines on a classic
control task, CartPole. Second, we demonstrate the empirical performance of our
formal policy synthesis framework on motion planning of a Dubins car with a
temporal specification.
Related papers
- Optimization of a Hydrodynamic Computational Reservoir through Evolution [58.720142291102135]
We interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir.
We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm.
Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters.
arXiv Detail & Related papers (2023-04-20T19:15:02Z) - Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size.
Our contribution is based upon combining three novel ideas.
Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem.
We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization.
Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z) - Formal Policy Synthesis for Continuous-Space Systems via Reinforcement
Learning [0.0]
We show how reinforcement learning can be applied for computing policies that are finite-memory and deterministic.
We develop the required assumptions and theories for the convergence of the learned policy to the optimal policy.
We demonstrate the approach on a 4-dim cart-pole system and 6-dim boat driving problem.
arXiv Detail & Related papers (2020-05-04T08:36:25Z) - Learning to be Global Optimizer [28.88646928299302]
We learn an optimal network and escaping capability algorithm for some benchmark functions.
We show that the learned algorithm significantly outperforms some well-known classical optimization algorithms.
arXiv Detail & Related papers (2020-03-10T03:46:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.