Related papers: Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

URL: http://arxiv.org/abs/2207.12509v1
Date: Mon, 25 Jul 2022 20:13:44 GMT
Title: Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs
Authors: Riccardo Poiani, Ciprian Stirbu, Alberto Maria Metelli and Marcello Restelli
Abstract summary: This paper introduces a novel framework, Semi-POMDPs, to model this type of problems. We provide a two-stage, "conquer & Conquer" (CC), that first configures the environment by finding an approximation of the optimal fleet deployment strategy. We validate our approach in large and real-world instances of the problem.
Score: 43.85442587999754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the continuous growth of the global economy and markets, resource imbalance has risen to be one of the central issues in real logistic scenarios. In marine transportation, this trade imbalance leads to Empty Container Repositioning (ECR) problems. Once the freight has been delivered from an exporting country to an importing one, the laden will turn into empty containers that need to be repositioned to satisfy new goods requests in exporting countries. In such problems, the performance that any cooperative repositioning policy can achieve strictly depends on the routes that vessels will follow (i.e., fleet deployment). Historically, Operation Research (OR) approaches were proposed to jointly optimize the repositioning policy along with the fleet of vessels. However, the stochasticity of future supply and demand of containers, together with black-box and non-linear constraints that are present within the environment, make these approaches unsuitable for these scenarios. In this paper, we introduce a novel framework, Configurable Semi-POMDPs, to model this type of problems. Furthermore, we provide a two-stage learning algorithm, "Configure & Conquer" (CC), that first configures the environment by finding an approximation of the optimal fleet deployment strategy, and then "conquers" it by learning an ECR policy in this tuned environmental setting. We validate our approach in large and real-world instances of the problem. Our experiments highlight that CC avoids the pitfalls of OR methods and that it is successful at optimizing both the ECR policy and the fleet of vessels, leading to superior performance in world trade environments.

Related papers

Byzantine-Resilient Over-the-Air Federated Learning under Zero-Trust Architecture [68.83934802584899]
We propose a novel Byzantine-robust FL paradigm for over-the-air transmissions, referred to as federated learning with secure adaptive clustering (FedSAC) FedSAC aims to protect a portion of the devices from attacks through zero trust architecture (ZTA) based Byzantine identification and adaptive device clustering. Numerical results substantiate the superiority of the proposed FedSAC over existing methods in terms of both test accuracy and convergence rate.
arXiv Detail & Related papers (2025-03-24T01:56:30Z)
Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform [0.0]
This study sets out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform. We propose a reinforcement learning (RL)-based strategic dual-control framework. We find the delivery efficiency and fairness of workload distribution among couriers have been improved.
arXiv Detail & Related papers (2025-01-10T09:15:40Z)
CROPS: A Deployable Crop Management System Over All Possible State Availabilities [11.831002170207547]
This paper introduces a deployable textbfCRop Management system textbfOver all textbfPossible textbfState availabilities (CROPS)
arXiv Detail & Related papers (2024-11-09T02:06:09Z)
Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping [2.9109581496560044]
This paper examines the robustness of benchmark deep reinforcement learning (RL) algorithms, implemented for inland waterway transport (IWT) within an autonomous shipping simulator. We show that a model-free approach can achieve an adequate policy in the simulator, successfully navigating port environments never encountered during training.
arXiv Detail & Related papers (2024-11-07T17:55:07Z)
Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT) ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z)
Learning to Sail Dynamic Networks: The MARLIN Reinforcement Learning Framework for Congestion Control in Tactical Environments [53.08686495706487]
This paper proposes an RL framework that leverages an accurate and parallelizable emulation environment to reenact the conditions of a tactical network. We evaluate our RL learning framework by training a MARLIN agent in conditions replicating a bottleneck link transition between a Satellite Communication (SATCOM) and an UHF Wide Band (UHF) radio link.
arXiv Detail & Related papers (2023-06-27T16:15:15Z)
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance. We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics. We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z)
Online Reinforcement Learning in Non-Stationary Context-Driven Environments [13.898711495948254]
We study online reinforcement learning (RL) in non-stationary environments. Online RL is challenging in such environments due to "catastrophic forgetting" (CF) We present Locally Constrained Policy Optimization (LCPO), an online RL approach that combats CF by anchoring policy outputs on old experiences.
arXiv Detail & Related papers (2023-02-04T15:31:19Z)
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z)
A Deep Reinforcement Learning Approach for Constrained Online Logistics Route Assignment [4.367543599338385]
It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly. This online route-assignment problem can be viewed as a constrained online decision-making problem. We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
arXiv Detail & Related papers (2021-09-08T07:27:39Z)
Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous. We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.