Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2501.10938v1
- Date: Sun, 19 Jan 2025 04:20:24 GMT
- Title: Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning
- Authors: Ahmed Alagha, Jamal Bentahar, Hadi Otrok, Shakti Singh, Rabeb Mizouni,
- Abstract summary: Multi-Agent Deep Reinforcement Learning (MDRL) is a promising research area in which agents learn complex behaviors in cooperative or competitive environments.
Recent works proposing Federated Reinforcement Learning (FRL) to tackle these issues suffer from problems related to model restrictions and maliciousness.
We propose a novel-assisted Multi-Expert Demonstration Cloning (MEDC) framework for MDRL.
- Score: 15.605693371392212
- License:
- Abstract: Multi-Agent Deep Reinforcement Learning (MDRL) is a promising research area in which agents learn complex behaviors in cooperative or competitive environments. However, MDRL comes with several challenges that hinder its usability, including sample efficiency, curse of dimensionality, and environment exploration. Recent works proposing Federated Reinforcement Learning (FRL) to tackle these issues suffer from problems related to model restrictions and maliciousness. Other proposals using reward shaping require considerable engineering and could lead to local optima. In this paper, we propose a novel Blockchain-assisted Multi-Expert Demonstration Cloning (MEDC) framework for MDRL. The proposed method utilizes expert demonstrations in guiding the learning of new MDRL agents, by suggesting exploration actions in the environment. A model sharing framework on Blockchain is designed to allow users to share their trained models, which can be allocated as expert models to requesting users to aid in training MDRL systems. A Consortium Blockchain is adopted to enable traceable and autonomous execution without the need for a single trusted entity. Smart Contracts are designed to manage users and models allocation, which are shared using IPFS. The proposed framework is tested on several applications, and is benchmarked against existing methods in FRL, Reward Shaping, and Imitation Learning-assisted RL. The results show the outperformance of the proposed framework in terms of learning speed and resiliency to faulty and malicious models.
Related papers
- Blockchain-based Crowdsourced Deep Reinforcement Learning as a Service [15.605693371392212]
Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for solving complex problems.
This paper proposes a novel crowdsourced DRL as a Service (DRL as a Service) framework.
The framework provides DRL-related services to users, covering two types of tasks: DRL training and model sharing.
arXiv Detail & Related papers (2025-01-22T18:54:39Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [64.13803241218886]
We present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems.
Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles.
We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks.
Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios.
We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z) - Unsupervised Solution Operator Learning for Mean-Field Games via Sampling-Invariant Parametrizations [7.230928145936957]
We develop a novel framework to learn the MFG solution operator.
Our model takes a MFG instances as input and output their solutions with one forward pass.
It is discretization-free, making it suitable for learning operators of high-dimensional MFGs.
arXiv Detail & Related papers (2024-01-27T19:07:49Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Defending Against Poisoning Attacks in Federated Learning with
Blockchain [12.840821573271999]
We propose a secure and reliable federated learning system based on blockchain and distributed ledger technology.
Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors.
arXiv Detail & Related papers (2023-07-02T11:23:33Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced
Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model.
We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms.
Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.