Related papers: Broad Critic Deep Actor Reinforcement Learning for Continuous Control

Broad Critic Deep Actor Reinforcement Learning for Continuous Control

URL: http://arxiv.org/abs/2411.15806v1
Date: Sun, 24 Nov 2024 12:24:46 GMT
Title: Broad Critic Deep Actor Reinforcement Learning for Continuous Control
Authors: Shiron Thalagala, Pak Kin Wong, Xiaozheng Wang,
Abstract summary: A novel hybrid architecture for actor-critic reinforcement learning (RL) algorithms is introduced. The proposed architecture integrates the broad learning system (BLS) with deep neural networks (DNNs) The effectiveness of the proposed algorithm is evaluated by applying it to two classic continuous control tasks.
Score: 5.440090782797941
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the domain of continuous control, deep reinforcement learning (DRL) demonstrates promising results. However, the dependence of DRL on deep neural networks (DNNs) results in the demand for extensive data and increased computational complexity. To address this issue, a novel hybrid architecture for actor-critic reinforcement learning (RL) algorithms is introduced. The proposed architecture integrates the broad learning system (BLS) with DNN, aiming to merge the strengths of both distinct architectural paradigms. Specifically, the critic network is implemented using BLS, while the actor network is constructed with a DNN. For the estimations of the critic network parameters, ridge regression is employed, and the parameters of the actor network are optimized through gradient descent. The effectiveness of the proposed algorithm is evaluated by applying it to two classic continuous control tasks, and its performance is compared with the widely recognized deep deterministic policy gradient (DDPG) algorithm. Numerical results show that the proposed algorithm is superior to the DDPG algorithm in terms of computational efficiency, along with an accelerated learning trajectory. Application of the proposed algorithm in other actor-critic RL algorithms is suggested for investigation in future studies.

Related papers

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates. We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z)
Advanced deep-reinforcement-learning methods for flow control: group-invariant and positional-encoding networks improve learning speed and quality [0.7421845364041001]
This study advances deep-reinforcement-learning (DRL) methods for flow control. We focus on integrating group-invariant networks and positional encoding into DRL architectures. The proposed methods are verified using a case study of Rayleigh-B'enard convection.
arXiv Detail & Related papers (2024-07-25T07:24:41Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
DDPG-Driven Deep-Unfolding with Adaptive Depth for Channel Estimation with Sparse Bayesian Learning [23.158142411929322]
We first develop a framework of deep deterministic policy gradient (DDPG)-driven deep-unfolding with adaptive depth for different inputs. Specifically, the framework is employed to deal with the channel estimation problem in massive multiple-input multiple-output systems.
arXiv Detail & Related papers (2022-01-20T22:35:42Z)
Recursive Least Squares Advantage Actor-Critic Algorithms [20.792917267835247]
We propose two novel RLS-based advantage actor critic (A2C) algorithms. RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network. From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks.
arXiv Detail & Related papers (2022-01-15T20:00:26Z)
On the Robustness of Controlled Deep Reinforcement Learning for Slice Placement [0.8459686722437155]
We compare two Deep Reinforcement Learning algorithms: a pure DRL-based algorithm and a hybrid DRL as a hybrid DRL-heuristic algorithm. The evaluation results show that the proposed hybrid DRL-heuristic approach is more robust and reliable in case of unpredictable network load changes than pure DRL.
arXiv Detail & Related papers (2021-08-05T10:24:33Z)
A Heuristically Assisted Deep Reinforcement Learning Approach for Network Slice Placement [0.7885276250519428]
We introduce a hybrid placement solution based on Deep Reinforcement Learning (DRL) and a dedicated optimization based on the Power of Two Choices principle. The proposed Heuristically-Assisted DRL (HA-DRL) allows to accelerate the learning process and gain in resource usage when compared against other state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-14T10:04:17Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Phase Retrieval using Expectation Consistent Signal Recovery Algorithm based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems. Recent advances in deep learning have opened up a new possibility for robust and fast PR. We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z)
Distributed Multi-agent Meta Learning for Trajectory Design in Wireless Drone Networks [151.27147513363502]
This paper studies the problem of the trajectory design for a group of energyconstrained drones operating in dynamic wireless network environments. A value based reinforcement learning (VDRL) solution and a metatraining mechanism is proposed.
arXiv Detail & Related papers (2020-12-06T01:30:12Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
A Differential Game Theoretic Neural Optimizer for Training Residual Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers. The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks [151.65541208130995]
A drone base station (DBS) is dispatched to provide uplink connectivity to ground users whose demand is dynamic and unpredictable. In this case, the DBS's trajectory must be adaptively adjusted to satisfy the dynamic user access requests. A meta-learning algorithm is proposed in order to adapt the DBS's trajectory when it encounters novel environments.
arXiv Detail & Related papers (2020-05-25T20:43:59Z)
PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning [16.269923100433232]
We develop a working algorithm, named Policy Pruning and Shrinking (PoPS), to train DRL models with strong performance. PoPS is based on a novel iterative policy pruning and shrinking method that leverages the power of transfer learning. We present an extensive experimental study that demonstrates the strong performance of PoPS using the popular Cartpole, Lunar Lander, Pong, and Pacman environments.
arXiv Detail & Related papers (2020-01-14T19:28:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.