Related papers: Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic

Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic

URL: http://arxiv.org/abs/2503.05818v2
Date: Sat, 22 Mar 2025 04:22:47 GMT
Title: Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic
Authors: Bassel El Mabsout, Abdelrahman AbdelGawad, Renato Mancuso,
Abstract summary: This paper presents the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL)<n>Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500% better sample efficiency compared to Soft Actor Critic.
Score: 1.4542411354617986
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formula representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500\% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of non-linear utility scalarization design, specifically for continuous control problems.

Related papers

Hierarchical Reinforcement Learning with Targeted Causal Interventions [24.93050534953955]
Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals.<n>We model the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it.<n>We harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal.
arXiv Detail & Related papers (2025-07-06T12:38:42Z)
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA [49.9801383018588]
We introduce QA-LIGN, an automatic symbolic reward decomposition approach.<n>Instead of training a black-box reward model that outputs a monolithic score, QA-LIGN formulates principle-specific evaluation questions.<n>Experiments aligning an uncensored large language model with a set of constitutional principles demonstrate that QA-LIGN offers greater transparency and adaptability.
arXiv Detail & Related papers (2025-06-09T18:24:57Z)
Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor [3.932152385564876]
This article introduces a curriculum learning approach to develop a robust stabilizing controller for a Quadrotor.<n>The learning objective is to achieve desired positions from random initial conditions.<n>A novel additive reward function is proposed, to incorporate transient and steady-state performance specifications.
arXiv Detail & Related papers (2025-01-30T17:05:32Z)
Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions [0.0]
Reinforcement learning has become an essential algorithm for generating complex robotic behaviors. To learn such behaviors, it is necessary to design a reward function that describes the task. In this paper, we propose the concept of Constraints as Rewards (CaR)
arXiv Detail & Related papers (2025-01-08T01:59:47Z)
Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning. We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging. We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Multi-Objective Reinforcement Learning-based Approach for Pressurized Water Reactor Optimization [0.0]
PEARL distinguishes itself from traditional policy-based multi-objective Reinforcement Learning methods by learning a single policy. Several versions inspired from deep learning and evolutionary techniques have been crafted, catering to both unconstrained and constrained problem domains. It is tested on two practical PWR core Loading Pattern optimization problems to showcase its real-world applicability.
arXiv Detail & Related papers (2023-12-15T20:41:09Z)
Reinforcement Learning with Non-Cumulative Objective [12.906500431427716]
In reinforcement learning, the objective is almost always defined as a emphcumulative function over the rewards along the process. In this paper, we propose a modification to existing algorithms for optimizing such objectives.
arXiv Detail & Related papers (2023-07-11T01:20:09Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
From STL Rulebooks to Rewards [4.859570041295978]
We propose a principled approach to shaping rewards for reinforcement learning from multiple objectives. We first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements. We then develop a method for systematically combining evaluations of multiple requirements into a single reward.
arXiv Detail & Related papers (2021-10-06T14:16:59Z)
Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.