Related papers: Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

URL: http://arxiv.org/abs/2005.00565v1
Date: Fri, 1 May 2020 18:37:38 GMT
Title: Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning
Authors: Wouter van Heeswijk
Abstract summary: Self-organizing containers can place bids on transport services in a spot market setting. By sharing information and costs between one another, smart containers can jointly learn bidding policies. We develop a reinforcement learning algorithm based on the policy framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Smart modular freight containers -- as propagated in the Physical Internet paradigm -- are equipped with sensors, data storage capability and intelligence that enable them to route themselves from origin to destination without manual intervention or central governance. In this self-organizing setting, containers can autonomously place bids on transport services in a spot market setting. However, for individual containers it may be difficult to learn good bidding policies due to limited observations. By sharing information and costs between one another, smart containers can jointly learn bidding policies, even though simultaneously competing for the same transport capacity. We replicate this behavior by learning stochastic bidding policies in a semi-cooperative multi agent setting. To this end, we develop a reinforcement learning algorithm based on the policy gradient framework. Numerical experiments show that sharing solely bids and acceptance decisions leads to stable bidding policies. Additional system information only marginally improves performance; individual job properties suffice to place appropriate bids. Furthermore, we find that carriers may have incentives not to share information with the smart containers. The experiments give rise to several directions for follow-up research, in particular the interaction between smart containers and transport services in self-organizing logistics.

Related papers

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z)
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability [93.11220429350278]
Information asymmetry is a pervasive feature of multi-agent systems.<n>This paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer?<n>We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning.
arXiv Detail & Related papers (2025-06-11T17:06:57Z)
Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems [12.471774408499817]
Agentic AI aims to create systems that set their own goals, adapt proactively to change, and refine behavior through continuous experience.<n>Recent advances suggest that, when facing multiple and unforeseen tasks, agents could benefit from sharing machine-learned knowledge and reuse policies that have already been fully or partially learned by other agents.<n>This study explores how an agent decides what knowledge to select, from whom, and when and how to integrate it in its own policy in order to accelerate its own learning.
arXiv Detail & Related papers (2025-06-05T20:38:11Z)
Agentic Knowledgeable Self-awareness [79.25908923383776]
KnowSelf is a data-centric approach that applies agents with knowledgeable self-awareness like humans. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge.
arXiv Detail & Related papers (2025-04-04T16:03:38Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z)
Robot Fleet Learning via Policy Merging [58.5086287737653]
We propose FLEET-MERGE to efficiently merge policies in the fleet setting. We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment. We introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks.
arXiv Detail & Related papers (2023-10-02T17:23:51Z)
Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning [18.408964908248855]
Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions. We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
arXiv Detail & Related papers (2023-06-12T13:31:58Z)
Towards Multi-Agent Reinforcement Learning driven Over-The-Counter Market Simulations [16.48389671789281]
We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market. By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors. We show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption.
arXiv Detail & Related papers (2022-10-13T17:06:08Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Containerized Distributed Value-Based Multi-Agent Reinforcement Learning [18.79371121484969]
We propose a containerized multi-agent reinforcement learning framework. To own knowledge, our method is the first to solve the challenging Google Research Football full game $5_v_5$. On the StarCraft II micromanagement benchmark, our method gets $4$-$18times$ better results compared to state-of-the-art non-distributed MARL algorithms.
arXiv Detail & Related papers (2021-10-15T15:54:06Z)
Wasserstein Unsupervised Reinforcement Learning [29.895142928565228]
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate latent learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. We propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.
arXiv Detail & Related papers (2021-10-15T08:41:51Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Strategic bidding in freight transport using deep reinforcement learning [0.0]
This paper presents a multi-agent reinforcement learning algorithm to represent strategic bidding behavior in freight transport markets. Using this algorithm, we investigate whether feasible market equilibriums arise without any central control or communication between agents.
arXiv Detail & Related papers (2021-02-18T10:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.