Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task
Scheduling
- URL: http://arxiv.org/abs/2208.01781v1
- Date: Tue, 2 Aug 2022 23:26:08 GMT
- Title: Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task
Scheduling
- Authors: Xiucheng Wang, Longfei Ma, Haocheng Li, Zhisheng Yin, Tom. Luan, Nan
Cheng
- Abstract summary: We propose a Digital Twin (DT)-assisted RL-based task scheduling method in order to improve the performance and convergence of the RL.
Two algorithms are designed to made task scheduling decisions, i.e., DT-assisted asynchronous Q-learning (DTAQL) and DT-assisted exploring Q-learning (DTEQL)
- Score: 10.777592783012702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task scheduling is a critical problem when one user offloads multiple
different tasks to the edge server. When a user has multiple tasks to offload
and only one task can be transmitted to server at a time, while server
processes tasks according to the transmission order, the problem is NP-hard.
However, it is difficult for traditional optimization methods to quickly obtain
the optimal solution, while approaches based on reinforcement learning face
with the challenge of excessively large action space and slow convergence. In
this paper, we propose a Digital Twin (DT)-assisted RL-based task scheduling
method in order to improve the performance and convergence of the RL. We use DT
to simulate the results of different decisions made by the agent, so that one
agent can try multiple actions at a time, or, similarly, multiple agents can
interact with environment in parallel in DT. In this way, the exploration
efficiency of RL can be significantly improved via DT, and thus RL can
converges faster and local optimality is less likely to happen. Particularly,
two algorithms are designed to made task scheduling decisions, i.e.,
DT-assisted asynchronous Q-learning (DTAQL) and DT-assisted exploring
Q-learning (DTEQL). Simulation results show that both algorithms significantly
improve the convergence speed of Q-learning by increasing the exploration
efficiency.
Related papers
- Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought [13.034968416139826]
We propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner.
IDT is inspired by the efficient hierarchical structure of human decision-making.
IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods.
arXiv Detail & Related papers (2024-05-31T08:38:25Z) - RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.
This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - FAMO: Fast Adaptive Multitask Optimization [48.59232177073481]
We introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way.
Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques.
arXiv Detail & Related papers (2023-06-06T15:39:54Z) - Teal: Learning-Accelerated Optimization of WAN Traffic Engineering [68.7863363109948]
We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control.
To reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand.
Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.
arXiv Detail & Related papers (2022-10-25T04:46:30Z) - On-edge Multi-task Transfer Learning: Model and Practice with
Data-driven Task Allocation [20.20889051697198]
We show that task allocation with task importance for Multi-task Transfer Learning (MTL) is a variant of the NP-complete Knapsack problem.
We propose a Data-driven Cooperative Task Allocation (DCTA) approach to solve TATIM with high computational efficiency.
Our DCTA reduces 3.24 times of processing time, and saves 48.4% energy consumption compared with the state-of-the-art when solving TATIM.
arXiv Detail & Related papers (2021-07-06T08:24:25Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - Distributed Primal-Dual Optimization for Online Multi-Task Learning [22.45069527817333]
We propose an adaptive primal-dual algorithm, which captures task-specific noise in adversarial learning and carries out a projection-free update with runtime efficiency.
Our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update.
Empirical results confirm that the proposed model is highly effective on various real-world datasets.
arXiv Detail & Related papers (2020-04-02T23:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.