Related papers: Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents

Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents

URL: http://arxiv.org/abs/2502.10732v1
Date: Sat, 15 Feb 2025 09:01:31 GMT
Title: Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents
Authors: Mauricio Tec, Guojun Xiong, Haichuan Wang, Francesca Dominici, Milind Tambe,
Abstract summary: Rule-Bottleneck Reinforcement Learning (RBRL) is a novel framework that jointly optimize decision and explanations.<n> Evaluations in real-world scenarios highlight RBRL's competitive performance with deep RL and efficiency gains over LLM fine-tuning.
Score: 26.7942726790676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Reinforcement Learning (RL) is remarkably effective in addressing sequential resource allocation problems in domains such as healthcare, public policy, and resource management. However, deep RL policies often lack transparency and adaptability, challenging their deployment alongside human decision-makers. In contrast, Language Agents, powered by large language models (LLMs), provide human-understandable reasoning but may struggle with effective decision making. To bridge this gap, we propose Rule-Bottleneck Reinforcement Learning (RBRL), a novel framework that jointly optimizes decision and explanations. At each step, RBRL generates candidate rules with an LLM, selects among them using an attention-based RL policy, and determines the environment action with an explanation via chain-of-thought reasoning. The RL rule selection is optimized using the environment rewards and an explainability metric judged by the LLM. Evaluations in real-world scenarios highlight RBRL's competitive performance with deep RL and efficiency gains over LLM fine-tuning. A survey further confirms the enhanced quality of its explanations.

Related papers

Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z)
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning [55.36978389831446]
We recast reflective exploration within the Bayes-Adaptive RL framework.<n>Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on observed outcomes.
arXiv Detail & Related papers (2025-05-26T22:51:00Z)
Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization [97.72503890388866]
We propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge. We introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision.
arXiv Detail & Related papers (2025-04-01T17:59:30Z)
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning [16.654435148168172]
Large Language Models (LLMs) have shown remarkable promise in reasoning and decision-making. We propose an LLM-guided hierarchical RL framework, termed LDSC, to enhance sample efficiency, generalization, and multi-task adaptability.
arXiv Detail & Related papers (2025-03-24T15:49:56Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.<n>Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.<n>Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.<n>We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs [12.572869123617783]
Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks. PbRL presents a pioneering framework that capitalizes on human preferences as pivotal reward signals. We propose a LLM-enabled automatic preference generation framework named LLM4PG.
arXiv Detail & Related papers (2024-06-28T04:21:24Z)
Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences.<n>Existing methods work by emulating the preferences at the single decision (turn) level.<n>We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning. Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z)
Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner. We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.