Online Attentive Kernel-Based Temporal Difference Learning
- URL: http://arxiv.org/abs/2201.09065v1
- Date: Sat, 22 Jan 2022 14:47:10 GMT
- Title: Online Attentive Kernel-Based Temporal Difference Learning
- Authors: Guang Yang, Xingguo Chen, Shangdong Yang, Huihui Wang, Shaokang Dong,
Yang Gao
- Abstract summary: Online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency.
Online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference.
We propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization.
- Score: 13.94346725929798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With rising uncertainty in the real world, online Reinforcement Learning (RL)
has been receiving increasing attention due to its fast learning capability and
improving data efficiency. However, online RL often suffers from complex Value
Function Approximation (VFA) and catastrophic interference, creating difficulty
for the deep neural network to be applied to an online RL algorithm in a fully
online setting. Therefore, a simpler and more adaptive approach is introduced
to evaluate value function with the kernel-based model. Sparse representations
are superior at handling interference, indicating that competitive sparse
representations should be learnable, non-prior, non-truncated and explicit when
compared with current sparse representation methods. Moreover, in learning
sparse representations, attention mechanisms are utilized to represent the
degree of sparsification, and a smooth attentive function is introduced into
the kernel-based VFA. In this paper, we propose an Online Attentive
Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale
optimization and provide convergence analysis of our proposed algorithm.
Experimental evaluations showed that OAKTD outperformed several Online
Kernel-based Temporal Difference (OKTD) learning algorithms in addition to the
Temporal Difference (TD) learning algorithm with Tile Coding on public Mountain
Car, Acrobot, CartPole and Puddle World tasks.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Backstepping Temporal Difference Learning [3.5823366350053325]
We propose a new convergent algorithm for off-policy TD-learning.
Our method relies on the backstepping technique, which is widely used in nonlinear control theory.
convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.
arXiv Detail & Related papers (2023-02-20T10:06:49Z) - Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies [6.303272140868826]
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces.
Current deep RL algorithms require a tremendous amount of environment interactions for learning.
offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data.
arXiv Detail & Related papers (2022-12-15T20:36:10Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - A Heuristically Assisted Deep Reinforcement Learning Approach for
Network Slice Placement [0.7885276250519428]
We introduce a hybrid placement solution based on Deep Reinforcement Learning (DRL) and a dedicated optimization based on the Power of Two Choices principle.
The proposed Heuristically-Assisted DRL (HA-DRL) allows to accelerate the learning process and gain in resource usage when compared against other state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-14T10:04:17Z) - Online Deterministic Annealing for Classification and Clustering [0.0]
We introduce an online prototype-based learning algorithm for clustering and classification.
We show that the proposed algorithm constitutes a competitive-learning neural network, the learning rule of which is formulated as an online approximation algorithm.
arXiv Detail & Related papers (2021-02-11T04:04:21Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.