Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching
- URL: http://arxiv.org/abs/2502.20845v1
- Date: Fri, 28 Feb 2025 08:43:32 GMT
- Title: Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching
- Authors: Shi Meng, Bin Tian, Xiaotong Zhang,
- Abstract summary: This paper introduces Curriculum-inspired Adaptive Direct Policy Guidance, a novel curriculum learning strategy for policy-based RL.<n>We adapt Proximal Policy Optimization for mine dispatching's uneven decision intervals using time deltas in Temporal Difference and Generalized Advantage Estimation.<n>We demonstrate our approach yields a 10% performance gain and faster convergence over standard PPO across sparse and dense reward settings.
- Score: 3.5305506674489076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient truck dispatching via Reinforcement Learning (RL) in open-pit mining is often hindered by reliance on complex reward engineering and value-based methods. This paper introduces Curriculum-inspired Adaptive Direct Policy Guidance, a novel curriculum learning strategy for policy-based RL to address these issues. We adapt Proximal Policy Optimization (PPO) for mine dispatching's uneven decision intervals using time deltas in Temporal Difference and Generalized Advantage Estimation, and employ a Shortest Processing Time teacher policy for guided exploration via policy regularization and adaptive guidance. Evaluations in OpenMines demonstrate our approach yields a 10% performance gain and faster convergence over standard PPO across sparse and dense reward settings, showcasing improved robustness to reward design. This direct policy guidance method provides a general and effective curriculum learning technique for RL-based truck dispatching, enabling future work on advanced architectures.
Related papers
- UAS Visual Navigation in Large and Unseen Environments via a Meta Agent [0.13654846342364302]
We propose a meta-curriculum training scheme to efficiently learn to navigate in large-scale urban environments.
We organize the training curriculum in a hierarchical manner such that the agent is guided from coarse to fine towards the target task.
In contrast to traditional reinforcement learning (RL), which focuses on acquiring a policy for a specific task, MRL aims to learn a policy with fast transfer ability to novel tasks.
arXiv Detail & Related papers (2025-03-20T01:44:59Z) - Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.
Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.
Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.
We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL)<n>We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z) - Forward KL Regularized Preference Optimization for Aligning Diffusion Policies [8.958830452149789]
A central problem for learning diffusion policies is to align the policy output with human intents in various tasks.<n>We propose a novel framework, Forward KL regularized Preference optimization, to align the diffusion policy with preferences directly.<n>The results show our method exhibits superior alignment with preferences and outperforms previous state-of-the-art algorithms.
arXiv Detail & Related papers (2024-09-09T13:56:03Z) - DPO: Differential reinforcement learning with application to optimal configuration search [3.2857981869020327]
Reinforcement learning with continuous state and action spaces remains one of the most challenging problems within the field.
We propose the first differential RL framework that can handle settings with limited training samples and short-length episodes.
arXiv Detail & Related papers (2024-04-24T03:11:12Z) - IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse [50.90781542323258]
Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
arXiv Detail & Related papers (2023-08-14T09:22:35Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks [57.17673320237597]
In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation.
This paper presents the first reward shaping framework for average-reward learning.
It proves that, under standard assumptions, the optimal policy under the original reward function can be recovered.
arXiv Detail & Related papers (2020-07-03T05:06:57Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.