A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem
- URL: http://arxiv.org/abs/2601.15038v1
- Date: Wed, 21 Jan 2026 14:42:33 GMT
- Title: A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem
- Authors: Mertcan Daysalilar, Fuat Uyguroglu, Gabriel Nicolosi, Adam Meyers,
- Abstract summary: Electric vehicle routing problem with time windows (EVRPTW) is a complex optimization problem in sustainable logistics.<n>We propose a curriculum-based deep reinforcement learning (CB-DRL) framework designed to resolve this instability.
- Score: 0.4666493857924357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The electric vehicle routing problem with time windows (EVRPTW) is a complex optimization problem in sustainable logistics, where routing decisions must minimize total travel distance, fleet size, and battery usage while satisfying strict customer time constraints. Although deep reinforcement learning (DRL) has shown great potential as an alternative to classical heuristics and exact solvers, existing DRL models often struggle to maintain training stability-failing to converge or generalize when constraints are dense. In this study, we propose a curriculum-based deep reinforcement learning (CB-DRL) framework designed to resolve this instability. The framework utilizes a structured three-phase curriculum that gradually increases problem complexity: the agent first learns distance and fleet optimization (Phase A), then battery management (Phase B), and finally the full EVRPTW (Phase C). To ensure stable learning across phases, the framework employs a modified proximal policy optimization algorithm with phase-specific hyperparameters, value and advantage clipping, and adaptive learning-rate scheduling. The policy network is built upon a heterogeneous graph attention encoder enhanced by global-local attention and feature-wise linear modulation. This specialized architecture explicitly captures the distinct properties of depots, customers, and charging stations. Trained exclusively on small instances with N=10 customers, the model demonstrates robust generalization to unseen instances ranging from N=5 to N=100, significantly outperforming standard baselines on medium-scale problems. Experimental results confirm that this curriculum-guided approach achieves high feasibility rates and competitive solution quality on out-of-distribution instances where standard DRL baselines fail, effectively bridging the gap between neural speed and operational reliability.
Related papers
- When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training [58.25341036646294]
We analytically examine why learning recurrent poles does not provide tangible benefits in data and empirically offer real-time learning scenarios.<n>We show that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
arXiv Detail & Related papers (2026-02-25T00:15:13Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z) - A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment [14.07560120879767]
Post-disaster road assessment (PDRA) is essential for emergency response, enabling rapid evaluation of infrastructure conditions.<n>Drones provide a flexible and effective tool for PDRA, routing them in large-scale networks remains challenging.<n>This study proposes a unified model (UM) for drone routing that simultaneously addresses eight PDRA variants.
arXiv Detail & Related papers (2025-10-24T14:48:57Z) - An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems [66.60904891478687]
We propose an Agentic Framework with LLMs (AFL) for solving complex vehicle routing problems.<n>AFL directly extracts knowledge from raw inputs and enables self-contained code generation.<n>We show that AFL substantially outperforms existing LLM-based baselines in both code reliability and solution feasibility.
arXiv Detail & Related papers (2025-10-19T03:59:25Z) - World Models as Reference Trajectories for Rapid Motor Adaptation [0.0]
Reflexive World Models (RWM) is a dual control framework that uses world model predictions as implicit reference trajectories for rapid adaptation.<n>Our method separates the control problem into long-term reward through reinforcement learning and robust motor execution.
arXiv Detail & Related papers (2025-05-21T14:46:41Z) - Enhancing Reinforcement Learning for the Floorplanning of Analog ICs with Beam Search [0.32985979395737786]
This paper presents a hybrid method that combines reinforcement learning (RL) with a beam (BS) strategy.<n>The BS algorithm enhances the agent's inference process, allowing for the generation of flexible floorplans.<n> Experimental results show approx. 5-85% improvement in area, dead space and half-perimeter wire length compared to a standard RL application.
arXiv Detail & Related papers (2025-05-08T08:50:32Z) - Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms [53.75036695728983]
Vehicle Routing Problems (VRP) are a fundamental NP-hard challenge in Evolutionary optimization.<n>We introduce an optimization framework where a reinforcement learning agent is trained on prior instances and quickly generates initial solutions.<n>This framework consistently outperforms current state-of-the-art solvers across various time budgets.
arXiv Detail & Related papers (2025-04-08T15:21:01Z) - A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility [5.19664437943693]
This paper presents a comprehensive optimization formulation of the fleet scheduling problem.
It also identifies the need for alternate solution approaches.
The new imitative approach achieves better mean performance and remarkable improvement in the case of unseen worst-case scenarios.
arXiv Detail & Related papers (2024-07-16T18:51:24Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Fast Approximate Solutions using Reinforcement Learning for Dynamic
Capacitated Vehicle Routing with Time Windows [3.5232085374661284]
This paper develops an inherently parallelised, fast, approximate learning-based solution to the generic class of Capacitated Vehicle Routing with Time Windows and Dynamic Routing (CVRP-TWDR)
Considering vehicles in a fleet as decentralised agents, we postulate that using reinforcement learning (RL) based adaptation is a key enabler for real-time route formation in a dynamic environment.
arXiv Detail & Related papers (2021-02-24T06:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.