Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks
- URL: http://arxiv.org/abs/2601.08254v1
- Date: Tue, 13 Jan 2026 06:23:21 GMT
- Title: Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks
- Authors: Abdikarim Mohamed Ibrahim, Rosdiadee Nordin,
- Abstract summary: We propose a Deep Reinforcement Learning (DRL) agent guided by a Large Language Model (LLM)<n>The results show that the LAM-DRL outperforms the traditional DRL by 40% in nominal weather scenarios and 64% in extreme weather scenarios.
- Score: 1.5469452301122173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large AI Model (LAM) have been proposed to applications of Non-Terrestrial Networks (NTN), that offer better performance with its great generalization and reduced task specific trainings. In this paper, we propose a Deep Reinforcement Learning (DRL) agent that is guided by a Large Language Model (LLM). The LLM operates as a high level coordinator that generates textual guidance that shape the reward of the DRL agent during training. The results show that the LAM-DRL outperforms the traditional DRL by 40% in nominal weather scenarios and 64% in extreme weather scenarios compared to heuristics in terms of throughput, fairness, and outage probability.
Related papers
- Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories [33.872433985210876]
Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories.<n>We propose Discover, Lea rn and Reinforce, which generates multiple distinct, high-success behavioral patterns for VLA pretraining.<n>When adapted to unseen downstream task suites, VLA models pretrained on our diverse RL data surpass counterparts trained on equal-sized standard RL datasets.
arXiv Detail & Related papers (2025-11-24T07:54:49Z) - Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z) - QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs [80.76334908639745]
We propose QeRL, a Quantization-enhanced Reinforcement Learning framework for large language models (LLMs)<n>QeRL addresses issues by combining NVFP4 quantization with Low-Rank Adaptation (LoRA)<n>Experiments demonstrate that QeRL delivers over 1.5 times speedup in the rollout phase.
arXiv Detail & Related papers (2025-10-13T17:55:09Z) - Reinforcement Learning on Pre-Training Data [55.570379963147424]
We introduce Reinforcement Learning on Pre-Training data (R), a new training-time scaling paradigm for optimizing large language models (LLMs)<n>R enables the policy to autonomously explore meaningful trajectories to learn from pre-training data and improve its capability through reinforcement learning (RL)<n>Extensive experiments on both general-domain and mathematical reasoning benchmarks across multiple models validate the effectiveness of R.
arXiv Detail & Related papers (2025-09-23T17:10:40Z) - A Survey of Reinforcement Learning for Large Reasoning Models [98.58081012669369]
Review of recent advances in Reinforcement Learning for reasoning with Large Language Models.<n>Further scaling of RL for LRMs now faces challenges not only in computational resources but also in algorithm design, training data, and infrastructure.
arXiv Detail & Related papers (2025-09-10T17:59:43Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [93.00629872970364]
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks.<n>We introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions.<n>We study whether difficult problems -- those yielding no RL signals and mixed-quality reasoning traces -- can still be effectively used for training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Toward Efficient Exploration by Large Language Model Agents [14.712532175418884]
Large language models (LLMs) can be used to explicitly implement an existing reinforcement learning algorithm.<n>We show how our LLM-based implementation of a known, data-efficient RL algorithm can be considerably more effective in natural language tasks.
arXiv Detail & Related papers (2025-04-29T17:59:48Z) - Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage.<n>In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels.<n>We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z) - AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback [37.22370177877156]
Large Language Models (LLMs) have demonstrated significant success across various domains.
Their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning.
We introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback.
Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.
arXiv Detail & Related papers (2023-09-29T12:16:19Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Enhancing the Generalization Performance and Speed Up Training for
DRL-based Mapless Navigation [18.13884934663477]
DRL agents performing well in training scenarios are found to perform poorly in some unseen real-world scenarios.
In this paper, we discuss why the DRL agent fails in such unseen scenarios and find the representation of LiDAR readings is the key factor behind the agent's performance degradation.
We propose an easy, but efficient input pre-processing (IP) approach to accelerate training and enhance the performance of the DRL agent in such scenarios.
arXiv Detail & Related papers (2021-03-22T09:36:51Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.