LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning
- URL: http://arxiv.org/abs/2508.05977v2
- Date: Thu, 14 Aug 2025 07:01:16 GMT
- Title: LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning
- Authors: Aoming Liang, Chi Cheng, Dashuai Chen, Boai Sun, Dixia Fan,
- Abstract summary: We introduce a semantically aligned reinforcement learning method where rewards are computed by aligning the current state with a target semantic instruction.<n>We show that semantic reward can guide learning to achieve competitive control behavior, even in the absence of hand-crafted reward functions.<n>This framework opens new horizons for aligning agent behavior with natural language goals and lays the groundwork for a more seamless integration of larger language models.
- Score: 0.7864304771129751
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the domain of scientific machine learning, designing effective reward functions remains a challenge in reinforcement learning (RL), particularly in environments where task goals are difficult to specify numerically. Reward functions in existing work are predominantly based on heuristics, manual engineering, or task-specific tuning. In this work, we introduce a semantically aligned reinforcement learning method where rewards are computed by aligning the current state with a target semantic instruction using a Sentence-Bidirectional Encoder Representations from Transformers (SBERT). Instead of relying on manually defined reward functions, the policy receives feedback based on the reward, which is a cosine similarity between the goal textual description and the statement description in the episode. We evaluated our approach in several environments and showed that semantic reward can guide learning to achieve competitive control behavior, even in the absence of hand-crafted reward functions. Our study demonstrates a correlation between the language embedding space and the conventional Euclidean space. This framework opens new horizons for aligning agent behavior with natural language goals and lays the groundwork for a more seamless integration of larger language models (LLMs) and fluid control applications.
Related papers
- Reinforcement Learning Enhancement Using Vector Semantic Representation and Symbolic Reasoning for Human-Centered Autonomous Emergency Braking [4.3152045411139675]
This paper proposes a novel pipeline that produces a neuro-symbolic feature representation that encompasses semantic, spatial, and shape information.<n>It also proposes a Soft First-Order Logic (SFOL) reward function that balances human values via a symbolic reasoning module.<n>The findings demonstrate that integrating holistic representations and soft reasoning into Reinforcement Learning can support more context-aware and value-aligned decision-making for autonomous driving.
arXiv Detail & Related papers (2026-02-04T21:56:27Z) - The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination [0.9099663022952496]
We argue that recent advances in large language models point toward a shift from hand-crafted numerical rewards to language-based objective specifications.<n>We conceptualize this transition along three dimensions: semantic reward specification, dynamic reward adaptation, and improved alignment with human intent.
arXiv Detail & Related papers (2026-01-13T05:47:18Z) - DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning [28.027785116421242]
We present DAIL (Distributional Aligned Learning), featuring two key components: distributional policy and semantic alignment.<n>We show that DAIL effectively resolves instruction ambiguities, achieving superior performance to baseline methods.
arXiv Detail & Related papers (2025-10-22T13:16:46Z) - FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making [32.050134958163184]
Foundation Models (FMs) and World Models (WMs) offer complementary strengths in task generalization at different levels.<n>We propose FOUNDER, a framework that integrates the generalizable knowledge embedded in FMs with the dynamic modeling capabilities of WMs.<n>We learn a mapping function that grounds FM representations in the WM state space, effectively inferring the agent's physical states in the world simulator from external observations.
arXiv Detail & Related papers (2025-07-15T21:49:49Z) - Subtask-Aware Visual Reward Learning from Segmented Demonstrations [97.80917991633248]
This paper introduces REDS: REward learning from Demonstration with Demonstrations, a novel reward learning framework.<n>We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals.<n>Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World.
arXiv Detail & Related papers (2025-02-28T01:25:37Z) - Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - A Pattern Language for Machine Learning Tasks [0.0]
We formalise the essential data of objective functions as equality constraints on composites of learners.<n>We develop a flowchart-like graphical mathematics for tasks that allows us to; (1) offer a unified perspective of approaches in machine learning across domains; (2) design and optimise desired behaviours model-agnostically; and (3) import insights from theoretical computer science into practical machine learning.
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.<n>Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement
Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations.
We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework.
Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - Discovering Generalizable Spatial Goal Representations via Graph-based
Active Reward Learning [17.58129740811116]
We propose a reward learning approach, Graph-based Equivalence Mappings (GEM)
GEM represents a spatial goal specification by a reward function conditioned on i) a graph indicating important spatial relationships between objects and ii) state equivalence mappings for each edge in the graph.
We show that GEM can drastically improve the generalizability of the learned goal representations over strong baselines.
arXiv Detail & Related papers (2022-11-24T18:59:06Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers [138.68213707587822]
We propose a simple, practical, and intuitive approach for domain adaptation in reinforcement learning.
We show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
Our approach is applicable to domains with continuous states and actions and does not require learning an explicit model of the dynamics.
arXiv Detail & Related papers (2020-06-24T17:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.