Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for
Semi-Cooperative Learning
- URL: http://arxiv.org/abs/2005.00565v1
- Date: Fri, 1 May 2020 18:37:38 GMT
- Title: Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for
Semi-Cooperative Learning
- Authors: Wouter van Heeswijk
- Abstract summary: Self-organizing containers can place bids on transport services in a spot market setting.
By sharing information and costs between one another, smart containers can jointly learn bidding policies.
We develop a reinforcement learning algorithm based on the policy framework.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Smart modular freight containers -- as propagated in the Physical Internet
paradigm -- are equipped with sensors, data storage capability and intelligence
that enable them to route themselves from origin to destination without manual
intervention or central governance. In this self-organizing setting, containers
can autonomously place bids on transport services in a spot market setting.
However, for individual containers it may be difficult to learn good bidding
policies due to limited observations. By sharing information and costs between
one another, smart containers can jointly learn bidding policies, even though
simultaneously competing for the same transport capacity. We replicate this
behavior by learning stochastic bidding policies in a semi-cooperative multi
agent setting. To this end, we develop a reinforcement learning algorithm based
on the policy gradient framework. Numerical experiments show that sharing
solely bids and acceptance decisions leads to stable bidding policies.
Additional system information only marginally improves performance; individual
job properties suffice to place appropriate bids. Furthermore, we find that
carriers may have incentives not to share information with the smart
containers. The experiments give rise to several directions for follow-up
research, in particular the interaction between smart containers and transport
services in self-organizing logistics.
Related papers
- A Scalable and Parallelizable Digital Twin Framework for Sustainable Sim2Real Transition of Multi-Agent Reinforcement Learning Systems [1.0582505915332336]
This work presents a sustainable multi-agent deep reinforcement learning framework capable of selectively scaling parallelized training workloads on-demand.
We introduce AutoDRIVE Ecosystem as an enabling digital twin framework to train, deploy, and transfer cooperative as well as competitive multi-agent reinforcement learning policies from simulation to reality.
arXiv Detail & Related papers (2024-03-16T18:47:04Z) - Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables.
The key idea is the design of switching policies that can take conformal quantiles as input.
We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - Towards Multi-Agent Reinforcement Learning driven Over-The-Counter
Market Simulations [16.48389671789281]
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market.
By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors.
We show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption.
arXiv Detail & Related papers (2022-10-13T17:06:08Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Containerized Distributed Value-Based Multi-Agent Reinforcement Learning [18.79371121484969]
We propose a containerized multi-agent reinforcement learning framework.
To own knowledge, our method is the first to solve the challenging Google Research Football full game $5_v_5$.
On the StarCraft II micromanagement benchmark, our method gets $4$-$18times$ better results compared to state-of-the-art non-distributed MARL algorithms.
arXiv Detail & Related papers (2021-10-15T15:54:06Z) - Wasserstein Unsupervised Reinforcement Learning [29.895142928565228]
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward.
These pre-trained policies can accelerate latent learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
We propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.
arXiv Detail & Related papers (2021-10-15T08:41:51Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Strategic bidding in freight transport using deep reinforcement learning [0.0]
This paper presents a multi-agent reinforcement learning algorithm to represent strategic bidding behavior in freight transport markets.
Using this algorithm, we investigate whether feasible market equilibriums arise without any central control or communication between agents.
arXiv Detail & Related papers (2021-02-18T10:17:10Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.