A new soft computing method for integration of expert's knowledge in
reinforcement learn-ing problems
- URL: http://arxiv.org/abs/2106.07088v1
- Date: Sun, 13 Jun 2021 20:41:29 GMT
- Title: A new soft computing method for integration of expert's knowledge in
reinforcement learn-ing problems
- Authors: Mohsen Annabestani, Ali Abedi, Mohammad Reza Nematollahi, and Mohammad
Bagher Naghibi Sis-tani
- Abstract summary: The proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step.
A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior.
Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate.
- Score: 1.11412540857944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel fuzzy action selection method to leverage human
knowledge in reinforcement learning problems. Based on the estimates of the
most current action-state values, the proposed fuzzy nonlinear mapping as-signs
each member of the action set to its probability of being chosen in the next
step. A user tunable parameter is introduced to control the action selection
policy, which determines the agent's greedy behavior throughout the learning
process. This parameter resembles the role of the temperature parameter in the
softmax action selection policy, but its tuning process can be more
knowledge-oriented since this parameter reflects the human knowledge into the
learning agent by making modifications in the fuzzy rule base. Simulation
results indicate that including fuzzy logic within the reinforcement learning
in the proposed manner improves the learning algorithm's convergence rate, and
provides superior performance.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Adaptive Robust Learning using Latent Bernoulli Variables [50.223140145910904]
We present an adaptive approach for learning from corrupted training sets.
We identify corrupted non-corrupted samples with latent Bernoulli variables.
The resulting problem is solved via variational inference.
arXiv Detail & Related papers (2023-12-01T13:50:15Z) - Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones.
We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z) - Episodic Policy Gradient Training [43.62408764384791]
Episodic Policy Gradient Training (EPGT)
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
arXiv Detail & Related papers (2021-12-03T11:15:32Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - META-Learning Eligibility Traces for More Sample Efficient Temporal
Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.
The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online.
We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.