Optimal Actor-Critic Policy with Optimized Training Datasets
- URL: http://arxiv.org/abs/2108.06911v1
- Date: Mon, 16 Aug 2021 06:09:55 GMT
- Title: Optimal Actor-Critic Policy with Optimized Training Datasets
- Authors: Chayan Banerjee, Zhiyong Chen, Nasimul Noman and Mohsen Zamani
- Abstract summary: Actor-critic (AC) algorithms are known for their efficacy and high performance in solving reinforcement learning problems.
They also suffer from low sampling efficiency.
We propose a strategy to optimize the training dataset that contains significantly less samples collected from the AC process.
- Score: 8.372742131747522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic (AC) algorithms are known for their efficacy and high
performance in solving reinforcement learning problems, but they also suffer
from low sampling efficiency. An AC based policy optimization process is
iterative and needs to frequently access the agent-environment system to
evaluate and update the policy by rolling out the policy, collecting rewards
and states (i.e. samples), and learning from them. It ultimately requires a
huge number of samples to learn an optimal policy. To improve sampling
efficiency, we propose a strategy to optimize the training dataset that
contains significantly less samples collected from the AC process. The dataset
optimization is made of a best episode only operation, a policy
parameter-fitness model, and a genetic algorithm module. The optimal policy
network trained by the optimized training dataset exhibits superior performance
compared to many contemporary AC algorithms in controlling autonomous dynamical
systems. Evaluation on standard benchmarks show that the method improves
sampling efficiency, ensures faster convergence to optima, and is more
data-efficient than its counterparts.
Related papers
- Primitive Agentic First-Order Optimization [0.0]
This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order reinforcement learning.
The results show that elementary RL methods combined with succinct partial state representations can be used as optimizeds manage complexity in RL-based optimization.
arXiv Detail & Related papers (2024-06-07T11:13:38Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - A Data-Driven Evolutionary Transfer Optimization for Expensive Problems
in Dynamic Environments [9.098403098464704]
Data-driven, a.k.a. surrogate-assisted, evolutionary optimization has been recognized as an effective approach for tackling expensive black-box optimization problems.
This paper proposes a simple but effective transfer learning framework to empower data-driven evolutionary optimization to solve dynamic optimization problems.
Experiments on synthetic benchmark test problems and a real-world case study demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-11-05T11:19:50Z) - Deep Reinforcement Learning for Exact Combinatorial Optimization:
Learning to Branch [13.024115985194932]
We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm.
We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
arXiv Detail & Related papers (2022-06-14T16:35:58Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Semi-supervised Batch Active Learning via Bilevel Optimization [89.37476066973336]
We formulate our approach as a data summarization problem via bilevel optimization.
We show that our method is highly effective in keyword detection tasks in the regime when only few labeled samples are available.
arXiv Detail & Related papers (2020-10-19T16:53:24Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.