Related papers: A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

URL: http://arxiv.org/abs/2106.07088v1
Date: Sun, 13 Jun 2021 20:41:29 GMT
Title: A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems
Authors: Mohsen Annabestani, Ali Abedi, Mohammad Reza Nematollahi, and Mohammad Bagher Naghibi Sis-tani
Abstract summary: The proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step. A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior. Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate.
Score: 1.11412540857944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel fuzzy action selection method to leverage human knowledge in reinforcement learning problems. Based on the estimates of the most current action-state values, the proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step. A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior throughout the learning process. This parameter resembles the role of the temperature parameter in the softmax action selection policy, but its tuning process can be more knowledge-oriented since this parameter reflects the human knowledge into the learning agent by making modifications in the fuzzy rule base. Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate, and provides superior performance.

Related papers

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Self-Composing Policies for Scalable Continual Reinforcement Learning [0.5461938536945723]
This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning.<n>The structure of each module allows the selective combination of previous policies along with its internal policy, accelerating the learning process on the current task.
arXiv Detail & Related papers (2025-06-04T15:12:15Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Adaptive Robust Learning using Latent Bernoulli Variables [50.223140145910904]
We present an adaptive approach for learning from corrupted training sets. We identify corrupted non-corrupted samples with latent Bernoulli variables. The resulting problem is solved via variational inference.
arXiv Detail & Related papers (2023-12-01T13:50:15Z)
Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z)
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL. They tend to struggle when asked to accomplish a series of heterogeneous tasks. We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z)
Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner. We frame the problem of learning options as a gradient-based meta-learning problem. We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z)
Episodic Policy Gradient Training [43.62408764384791]
Episodic Policy Gradient Training (EPGT) We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
arXiv Detail & Related papers (2021-12-03T11:15:32Z)
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online. We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.