Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for
Semi-Cooperative Learning
- URL: http://arxiv.org/abs/2005.00565v1
- Date: Fri, 1 May 2020 18:37:38 GMT
- Title: Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for
Semi-Cooperative Learning
- Authors: Wouter van Heeswijk
- Abstract summary: Self-organizing containers can place bids on transport services in a spot market setting.
By sharing information and costs between one another, smart containers can jointly learn bidding policies.
We develop a reinforcement learning algorithm based on the policy framework.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Smart modular freight containers -- as propagated in the Physical Internet
paradigm -- are equipped with sensors, data storage capability and intelligence
that enable them to route themselves from origin to destination without manual
intervention or central governance. In this self-organizing setting, containers
can autonomously place bids on transport services in a spot market setting.
However, for individual containers it may be difficult to learn good bidding
policies due to limited observations. By sharing information and costs between
one another, smart containers can jointly learn bidding policies, even though
simultaneously competing for the same transport capacity. We replicate this
behavior by learning stochastic bidding policies in a semi-cooperative multi
agent setting. To this end, we develop a reinforcement learning algorithm based
on the policy gradient framework. Numerical experiments show that sharing
solely bids and acceptance decisions leads to stable bidding policies.
Additional system information only marginally improves performance; individual
job properties suffice to place appropriate bids. Furthermore, we find that
carriers may have incentives not to share information with the smart
containers. The experiments give rise to several directions for follow-up
research, in particular the interaction between smart containers and transport
services in self-organizing logistics.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Conformal Policy Learning for Sensorimotor Control Under Distribution
Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables.
The key idea is the design of switching policies that can take conformal quantiles as input.
We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z) - Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting.
We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment.
We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z) - Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning [18.408964908248855]
Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions.
We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments.
Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
arXiv Detail & Related papers (2023-06-12T13:31:58Z) - Towards Multi-Agent Reinforcement Learning driven Over-The-Counter
Market Simulations [16.48389671789281]
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market.
By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors.
We show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption.
arXiv Detail & Related papers (2022-10-13T17:06:08Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Containerized Distributed Value-Based Multi-Agent Reinforcement Learning [18.79371121484969]
We propose a containerized multi-agent reinforcement learning framework.
To own knowledge, our method is the first to solve the challenging Google Research Football full game $5_v_5$.
On the StarCraft II micromanagement benchmark, our method gets $4$-$18times$ better results compared to state-of-the-art non-distributed MARL algorithms.
arXiv Detail & Related papers (2021-10-15T15:54:06Z) - Wasserstein Unsupervised Reinforcement Learning [29.895142928565228]
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward.
These pre-trained policies can accelerate latent learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
We propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.
arXiv Detail & Related papers (2021-10-15T08:41:51Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Strategic bidding in freight transport using deep reinforcement learning [0.0]
This paper presents a multi-agent reinforcement learning algorithm to represent strategic bidding behavior in freight transport markets.
Using this algorithm, we investigate whether feasible market equilibriums arise without any central control or communication between agents.
arXiv Detail & Related papers (2021-02-18T10:17:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.