LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2603.01488v1
- Date: Mon, 02 Mar 2026 05:54:02 GMT
- Title: LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning
- Authors: Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo,
- Abstract summary: Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications.<n>Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges.<n>We introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring.
- Score: 23.916253226597956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.
Related papers
- Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning [34.343514432589586]
This paper proposes textbf, a novel framework that integrates Low-Rank Adaptation (LoRA) with a continuous fine-tuning strategy.<n> Experiments on 15 diverse datasets show that DEAL consistently outperforms baseline methods.<n>These findings demonstrate the potential of our approach to advance continual adaptation in Large Language Models.
arXiv Detail & Related papers (2025-09-23T12:55:57Z) - Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? [36.957333458197034]
Large Language Models (LLMs) achieve strong performance on diverse tasks but often exhibit cognitive inertia.<n>We propose Inverse IFEval, a benchmark that measures models' capacity to override training-induced biases and comply with adversarial instructions.
arXiv Detail & Related papers (2025-09-04T15:03:02Z) - Enhancing Cross-task Transfer of Large Language Models via Activation Steering [75.41750053623298]
Cross-task in-context learning offers a direct solution for transferring knowledge across tasks.<n>We investigate whether cross-task transfer can be achieved via latent space steering without parameter updates or input expansion.<n>We propose a novel Cross-task Activation Steering Transfer framework that enables effective transfer by manipulating the model's internal activation states.
arXiv Detail & Related papers (2025-07-17T15:47:22Z) - Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.<n>In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.<n>Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z) - Unlocking the Power of LLM Uncertainty for Active In-Context Example Selection [6.813733517894384]
Uncertainty Tripartite Testing Paradigm (Unc-TTP) is a novel method for classifying Large Language Models (LLMs) uncertainty.<n>Unc-TTP performs three rounds of sampling under varying label injection interference, enumerating all possible outcomes.<n>Our experiments show that uncertainty examples selected via Unc-TTP are more informative than certainty examples.
arXiv Detail & Related papers (2024-08-17T11:33:23Z) - TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE.
We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z) - FedEGG: Federated Learning with Explicit Global Guidance [90.04705121816185]
Federated Learning (FL) holds great potential for diverse applications owing to its privacy-preserving nature.<n>Existing methods help address these challenges via optimization-based client constraints, adaptive client selection, or the use of pre-trained models or synthetic data.<n>We present bftextFedEGG, a new FL algorithm that constructs a global guiding task using a well-defined, easy-to-converge learning task.
arXiv Detail & Related papers (2024-04-18T04:25:21Z) - Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts [10.929547354171723]
This paper introduces Knowledgeable Agents from Language Model Rollouts (KALM)
It extracts knowledge from large language models (LLMs) in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods.
It achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods.
arXiv Detail & Related papers (2024-04-14T13:19:40Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Creativity of AI: Hierarchical Planning Model Learning for Facilitating
Deep Reinforcement Learning [19.470693909025798]
We introduce a novel deep reinforcement learning framework with symbolic options.
Our framework features a loop training procedure, which enables guiding the improvement of policy.
We conduct experiments on two domains, Montezuma's Revenge and Office World, respectively.
arXiv Detail & Related papers (2021-12-18T03:45:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.