Related papers: Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning

Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning

URL: http://arxiv.org/abs/2505.08361v1
Date: Tue, 13 May 2025 09:08:28 GMT
Title: Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Authors: Xinyue Wang, Biwei Huang,
Abstract summary: We introduce World Modeling with Compositional Causal Components (WM3C)<n>This framework enhances reinforcement learning by learning and leveraging causal components.<n>Our approach integrates language as a compositional modality to decompose the latent space into meaningful components.
Score: 15.594198876509628
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generalization in reinforcement learning (RL) remains a significant challenge, especially when agents encounter novel environments with unseen dynamics. Drawing inspiration from human compositional reasoning -- where known components are reconfigured to handle new situations -- we introduce World Modeling with Compositional Causal Components (WM3C). This novel framework enhances RL generalization by learning and leveraging compositional causal components. Unlike previous approaches focusing on invariant representation learning or meta-learning, WM3C identifies and utilizes causal dynamics among composable elements, facilitating robust adaptation to new tasks. Our approach integrates language as a compositional modality to decompose the latent space into meaningful components and provides theoretical guarantees for their unique identification under mild assumptions. Our practical implementation uses a masked autoencoder with mutual information constraints and adaptive sparsity regularization to capture high-level semantic information and effectively disentangle transition dynamics. Experiments on numerical simulations and real-world robotic manipulation tasks demonstrate that WM3C significantly outperforms existing methods in identifying latent processes, improving policy learning, and generalizing to unseen tasks.

Related papers

Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
Dynamic Manipulation of Deformable Objects in 3D: Simulation, Benchmark and Learning Strategy [88.8665000676562]
Prior methods often simplify the problem to low-speed or 2D settings, limiting their applicability to real-world 3D tasks.<n>To mitigate data scarcity, we introduce a novel simulation framework and benchmark grounded in reduced-order dynamics.<n>We propose Dynamics Informed Diffusion Policy (DIDP), a framework that integrates imitation pretraining with physics-informed test-time adaptation.
arXiv Detail & Related papers (2025-05-23T03:28:25Z)
Better Decisions through the Right Causal World Model [17.623937562865617]
Causal Object-centric Model Extraction Tool (COMET) is a novel algorithm designed to learn the exact interpretable causal world models (CWMs)<n>Our results, validated in Atari environments such as Pong and Freeway, demonstrate the accuracy and robustness of COMET.
arXiv Detail & Related papers (2025-04-09T20:29:13Z)
Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality [20.958479821810762]
We extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning.<n>Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions.
arXiv Detail & Related papers (2025-04-02T07:56:39Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation [0.0]
This paper introduces a novel approach to creating adaptive language agents by integrating active inference with large language models (LLMs)<n>Our framework models the environment using three state factors (prompt, search, and information states) with seven observation modalities capturing quality metrics.<n> Experimental results demonstrate the effectiveness of this approach, with the agent developing accurate models of environment dynamics.
arXiv Detail & Related papers (2024-12-10T16:34:47Z)
Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning [35.87062321504049]
Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges.
arXiv Detail & Related papers (2022-10-09T09:44:23Z)
Meta-learning using privileged information for dynamics [66.32254395574994]
We extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting. We validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
arXiv Detail & Related papers (2021-04-29T12:18:02Z)
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning [124.9856253431878]
We decompose the task of learning a global dynamics model into two stages: (a) learning a context latent vector that captures the local dynamics, then (b) predicting the next state conditioned on it. In order to encode dynamics-specific information into the context latent vector, we introduce a novel loss function that encourages the context latent vector to be useful for predicting both forward and backward dynamics. The proposed method achieves superior generalization ability across various simulated robotics and control tasks, compared to existing RL schemes.
arXiv Detail & Related papers (2020-05-14T08:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.