Related papers: Reward Shaping via Diffusion Process in Reinforcement Learning

Reward Shaping via Diffusion Process in Reinforcement Learning

URL: http://arxiv.org/abs/2306.11885v1
Date: Tue, 20 Jun 2023 20:58:33 GMT
Title: Reward Shaping via Diffusion Process in Reinforcement Learning
Authors: Peeyush Kumar
Abstract summary: I leverage the principles of thermodynamics and system dynamics to explore reward shaping via diffusion processes. This article sheds light on relationships between information entropy, system dynamics, and their influences on entropy production.
Score: 1.0878040851638
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) models have continually evolved to navigate the exploration - exploitation trade-off in uncertain Markov Decision Processes (MDPs). In this study, I leverage the principles of stochastic thermodynamics and system dynamics to explore reward shaping via diffusion processes. This provides an elegant framework as a way to think about exploration-exploitation trade-off. This article sheds light on relationships between information entropy, stochastic system dynamics, and their influences on entropy production. This exploration allows us to construct a dual-pronged framework that can be interpreted as either a maximum entropy program for deriving efficient policies or a modified cost optimization program accounting for informational costs and benefits. This work presents a novel perspective on the physical nature of information and its implications for online learning in MDPs, consequently providing a better understanding of information-oriented formulations in RL.

Related papers

Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Interpretable and Explainable Machine Learning Methods for Predictive Process Monitoring: A Systematic Literature Review [1.3812010983144802]
This paper presents a systematic review on the explainability and interpretability of machine learning (ML) models within the context of predictive process mining. We provide a comprehensive overview of the current methodologies and their applications across various application domains. Our findings aim to equip researchers and practitioners with a deeper understanding of how to develop and implement more trustworthy, transparent, and effective intelligent systems for process analytics.
arXiv Detail & Related papers (2023-12-29T12:43:43Z)
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems. We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z)
Stochastic Thermodynamics of Learning Parametric Probabilistic Models [0.0]
We introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the flow of information during the learning process of Parametric Probabilistic Models (PPMs) We demonstrate that the accumulation of L-info during the learning process is associated with entropy production, and parameters serve as a heat reservoir in this process, capturing learned information in the form of M-info.
arXiv Detail & Related papers (2023-10-04T01:32:55Z)
Self-Supervised Learning with Lie Symmetries for Partial Differential Equations [25.584036829191902]
We learn general-purpose representations of PDEs by implementing joint embedding methods for self-supervised learning (SSL) Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
arXiv Detail & Related papers (2023-07-11T16:52:22Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL) We integrate a term inspired by variational empowerment into a state-space model based on mutual information. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
Privileged Information Dropout in Reinforcement Learning [56.82218103971113]
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. In this work, we investigate Privileged Information Dropout (pid) for achieving the latter which can be applied equally to value-based and policy-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-05-19T05:32:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.