Optimizing Empty Container Repositioning and Fleet Deployment via
Configurable Semi-POMDPs
- URL: http://arxiv.org/abs/2207.12509v1
- Date: Mon, 25 Jul 2022 20:13:44 GMT
- Title: Optimizing Empty Container Repositioning and Fleet Deployment via
Configurable Semi-POMDPs
- Authors: Riccardo Poiani, Ciprian Stirbu, Alberto Maria Metelli and Marcello
Restelli
- Abstract summary: This paper introduces a novel framework, Semi-POMDPs, to model this type of problems.
We provide a two-stage, "conquer & Conquer" (CC), that first configures the environment by finding an approximation of the optimal fleet deployment strategy.
We validate our approach in large and real-world instances of the problem.
- Score: 43.85442587999754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the continuous growth of the global economy and markets, resource
imbalance has risen to be one of the central issues in real logistic scenarios.
In marine transportation, this trade imbalance leads to Empty Container
Repositioning (ECR) problems. Once the freight has been delivered from an
exporting country to an importing one, the laden will turn into empty
containers that need to be repositioned to satisfy new goods requests in
exporting countries. In such problems, the performance that any cooperative
repositioning policy can achieve strictly depends on the routes that vessels
will follow (i.e., fleet deployment). Historically, Operation Research (OR)
approaches were proposed to jointly optimize the repositioning policy along
with the fleet of vessels. However, the stochasticity of future supply and
demand of containers, together with black-box and non-linear constraints that
are present within the environment, make these approaches unsuitable for these
scenarios. In this paper, we introduce a novel framework, Configurable
Semi-POMDPs, to model this type of problems. Furthermore, we provide a
two-stage learning algorithm, "Configure & Conquer" (CC), that first configures
the environment by finding an approximation of the optimal fleet deployment
strategy, and then "conquers" it by learning an ECR policy in this tuned
environmental setting. We validate our approach in large and real-world
instances of the problem. Our experiments highlight that CC avoids the pitfalls
of OR methods and that it is successful at optimizing both the ECR policy and
the fleet of vessels, leading to superior performance in world trade
environments.
Related papers
- Byzantine-Resilient Over-the-Air Federated Learning under Zero-Trust Architecture [68.83934802584899]
We propose a novel Byzantine-robust FL paradigm for over-the-air transmissions, referred to as federated learning with secure adaptive clustering (FedSAC)
FedSAC aims to protect a portion of the devices from attacks through zero trust architecture (ZTA) based Byzantine identification and adaptive device clustering.
Numerical results substantiate the superiority of the proposed FedSAC over existing methods in terms of both test accuracy and convergence rate.
arXiv Detail & Related papers (2025-03-24T01:56:30Z) - Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform [0.0]
This study sets out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform.
We propose a reinforcement learning (RL)-based strategic dual-control framework.
We find the delivery efficiency and fairness of workload distribution among couriers have been improved.
arXiv Detail & Related papers (2025-01-10T09:15:40Z) - CROPS: A Deployable Crop Management System Over All Possible State Availabilities [11.831002170207547]
This paper introduces a deployable textbfCRop Management system textbfOver all textbfPossible textbfState availabilities (CROPS)
arXiv Detail & Related papers (2024-11-09T02:06:09Z) - Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping [2.9109581496560044]
This paper examines the robustness of benchmark deep reinforcement learning (RL) algorithms, implemented for inland waterway transport (IWT) within an autonomous shipping simulator.
We show that a model-free approach can achieve an adequate policy in the simulator, successfully navigating port environments never encountered during training.
arXiv Detail & Related papers (2024-11-07T17:55:07Z) - Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT)
ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z) - Learning to Sail Dynamic Networks: The MARLIN Reinforcement Learning
Framework for Congestion Control in Tactical Environments [53.08686495706487]
This paper proposes an RL framework that leverages an accurate and parallelizable emulation environment to reenact the conditions of a tactical network.
We evaluate our RL learning framework by training a MARLIN agent in conditions replicating a bottleneck link transition between a Satellite Communication (SATCOM) and an UHF Wide Band (UHF) radio link.
arXiv Detail & Related papers (2023-06-27T16:15:15Z) - Hallucinated Adversarial Control for Conservative Offline Policy
Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance.
We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics.
We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z) - Online Reinforcement Learning in Non-Stationary Context-Driven
Environments [13.898711495948254]
We study online reinforcement learning (RL) in non-stationary environments.
Online RL is challenging in such environments due to "catastrophic forgetting" (CF)
We present Locally Constrained Policy Optimization (LCPO), an online RL approach that combats CF by anchoring policy outputs on old experiences.
arXiv Detail & Related papers (2023-02-04T15:31:19Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment [4.367543599338385]
It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly.
This online route-assignment problem can be viewed as a constrained online decision-making problem.
We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
arXiv Detail & Related papers (2021-09-08T07:27:39Z) - Cautious Adaptation For Reinforcement Learning in Safety-Critical
Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous.
We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments.
We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.