Related papers: FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

URL: http://arxiv.org/abs/2501.00364v3
Date: Fri, 28 Feb 2025 17:13:11 GMT
Title: FORM: Learning Expressive and Transferable First-Order Logic Reward Machines
Authors: Leo Ardon, Daniel Furelos-Blanco, Roko Parac, Alessandra Russo,
Abstract summary: Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning.<n>We propose First-Order Reward Machines ($texttFORM$s), which use first-order logic to label edges.<n>We show that $texttFORM$s can be effectively learnt for tasks where traditional RM learning approaches fail.
Score: 48.36822060760614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines ($\texttt{FORM}$s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for $\textbf{learning}$ $\texttt{FORM}$s and a multi-agent formulation for $\textbf{exploiting}$ them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared $\texttt{FORM}$. Our experimental results demonstrate the scalability of $\texttt{FORM}$s with respect to traditional RMs. Specifically, we show that $\texttt{FORM}$s can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

Related papers

Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation [9.282278040339138]
$textbfR2Rec$ is a reasoning-enhanced recommendation framework.<n>It samples interaction chains from the user-item graph and converts them into structured interaction-of-thoughts.
arXiv Detail & Related papers (2025-06-05T14:16:44Z)
RewardAnything: Generalizable Principle-Following Reward Models [82.16312590749052]
Reward models are typically trained on fixed preference datasets.<n>This prevents adaptation to diverse real-world needs-from conciseness in one task to detailed explanations in another.<n>We introduce generalizable, principle-following reward models.<n>We present RewardAnything, a novel RM designed and trained to explicitly follow natural language principles.
arXiv Detail & Related papers (2025-06-04T07:30:16Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
$\texttt{SEM-CTRL}$: Semantically Controlled Decoding [53.86639808659575]
$texttSEM-CTRL$ is a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. texttSEM-CTRL$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models.
arXiv Detail & Related papers (2025-03-03T18:33:46Z)
Reward Machine Inference for Robotic Manipulation [1.6135226672466307]
Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons. We introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.
arXiv Detail & Related papers (2024-12-13T12:32:53Z)
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems [42.137278756052595]
$texttAgentPrune$ can seamlessly integrate into mainstream multi-agent systems. textbf(I) integrates seamlessly into existing multi-agent frameworks with $28.1%sim72.8%downarrow$ token reduction. textbf(III) successfully defend against two types of agent-based adversarial attacks with $3.5%sim10.8%uparrow$ performance boost.
arXiv Detail & Related papers (2024-10-03T14:14:31Z)
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits [56.93583799109029]
We introduce LASeR, which frames reward model selection as a multi-armed bandit problem.<n>We show that LASeR boosts iterative training, improving the absolute average accuracy of Llama-3-8B over three datasets.<n>We also show that LASeR leads to a 72.69% AlpacaEval win rate over the RM score ensemble baseline.
arXiv Detail & Related papers (2024-10-02T16:46:38Z)
Tabular Transfer Learning via Prompting LLMs [52.96022335067357]
We propose a novel framework, Prompt to Transfer (P2T), that utilizes unlabeled (or heterogeneous) source data with large language models (LLMs) P2T identifies a column feature in a source dataset that is strongly correlated with a target task feature to create examples relevant to the target task, thus creating pseudo-demonstrations for prompts.
arXiv Detail & Related papers (2024-08-09T11:30:52Z)
M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation [78.77004913030285]
M$3$GPT is an advanced $textbfM$ultimodal, $textbfM$ultitask framework for comprehension and generation. We employ discrete vector quantization for multimodal conditional signals, such as text, music and motion/dance, enabling seamless integration into a large language model. M$3$GPT learns to model the connections and synergies among various motion-relevant tasks.
arXiv Detail & Related papers (2024-05-25T15:21:59Z)
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z)
Correct and Optimal: the Regular Expression Inference Challenge [10.899596368151892]
We propose regular expression inference (REI) as a challenge for code/language modelling. We generate and publish the first large-scale datasets for REI.
arXiv Detail & Related papers (2023-08-15T17:40:10Z)
Learning In-context Learning for Named Entity Recognition [54.022036267886214]
Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. This paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs. We show that our method can effectively inject in-context NER ability into PLMs and significantly outperforms the PLMs+fine-tuning counterparts.
arXiv Detail & Related papers (2023-05-18T15:31:34Z)
Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning. IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z)
Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.