Effective Reinforcement Learning Based on Structural Information Principles
- URL: http://arxiv.org/abs/2404.09760v1
- Date: Mon, 15 Apr 2024 13:02:00 GMT
- Title: Effective Reinforcement Learning Based on Structural Information Principles
- Authors: Xianghua Zeng, Hao Peng, Dingli Su, Angsheng Li,
- Abstract summary: We propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM.
SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance.
- Score: 19.82391136775341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph's high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy's quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.
Related papers
- Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations.
This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z) - Effective Exploration Based on the Structural Information Principles [21.656199029188056]
We propose a novel Structural Information principles-based Effective Exploration framework, namely SI2E.
We show that SI2E significantly outperforms state-of-the-art exploration baselines regarding final performance and sample efficiency.
arXiv Detail & Related papers (2024-10-09T07:19:16Z) - A Clustering Method with Graph Maximum Decoding Information [6.11503045313947]
We present a novel clustering method for maximizing decoding information within graph-based models, named CMDI.
CMDI incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph partitioning.
Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R)
These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.
arXiv Detail & Related papers (2024-03-18T05:18:19Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting [0.0]
We introduce a novel ensemble approach for feature selection based on hierarchical stacking for non-stationarity.
Our approach exploits the co-dependency between features using a hierarchical structure.
The effectiveness of the approach is demonstrated on synthetic and well-known real-life datasets.
arXiv Detail & Related papers (2023-10-26T16:40:09Z) - Hierarchical State Abstraction Based on Structural Information
Principles [70.24495170921075]
We propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective.
SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
arXiv Detail & Related papers (2023-04-24T11:06:52Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - The Gradient Convergence Bound of Federated Multi-Agent Reinforcement
Learning with Efficient Communication [20.891460617583302]
The paper considers independent reinforcement learning (IRL) for collaborative decision-making in the paradigm of federated learning (FL)
FL generates excessive communication overheads between agents and a remote central server.
This paper proposes two advanced optimization schemes to improve the system's utility value.
arXiv Detail & Related papers (2021-03-24T07:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.