Related papers: LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning

LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning

URL: http://arxiv.org/abs/2512.19516v1
Date: Mon, 22 Dec 2025 16:08:03 GMT
Title: LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning
Authors: Xueming Yan, Bo Yin, Yaochu Jin,
Abstract summary: Multiobjective reinforcement learning (MORL) poses significant challenges due to the inherent conflicts between objectives and the difficulty of adapting to dynamic environments.<n>Traditional methods often struggle to generalize effectively, particularly in large and complex state-action spaces.<n>We introduce the Latent Causal Diffusion Model (LacaDM), a novel approach designed to enhance the adaptability of MORL in discrete and continuous environments.
Score: 26.68981028489201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multiobjective reinforcement learning (MORL) poses significant challenges due to the inherent conflicts between objectives and the difficulty of adapting to dynamic environments. Traditional methods often struggle to generalize effectively, particularly in large and complex state-action spaces. To address these limitations, we introduce the Latent Causal Diffusion Model (LacaDM), a novel approach designed to enhance the adaptability of MORL in discrete and continuous environments. Unlike existing methods that primarily address conflicts between objectives, LacaDM learns latent temporal causal relationships between environmental states and policies, enabling efficient knowledge transfer across diverse MORL scenarios. By embedding these causal structures within a diffusion model-based framework, LacaDM achieves a balance between conflicting objectives while maintaining strong generalization capabilities in previously unseen environments. Empirical evaluations on various tasks from the MOGymnasium framework demonstrate that LacaDM consistently outperforms the state-of-art baselines in terms of hypervolume, sparsity, and expected utility maximization, showcasing its effectiveness in complex multiobjective tasks.

Related papers

Flow Matching for Offline Reinforcement Learning with Discrete Actions [18.806918500759704]
We extend flow matching to a general framework that supports discrete action spaces with multiple objectives.<n>Specifically, we replace continuous flows with continuous-time Markov chains, trained using a Q-weighted flow matching objective.<n>We then extend our design to multi-agent settings, mitigating the exponential growth of joint action spaces via a factorized conditional path.
arXiv Detail & Related papers (2026-02-05T19:13:44Z)
From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z)
Flexible Multitask Learning with Factorized Diffusion Policy [59.526246520933135]
Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions.<n>Existing monolithic models often underfit the action distribution and lack the flexibility required for efficient adaptation.<n>We introduce a novel modular diffusion policy framework that factorizes complex action distributions into a composition of specialized diffusion models.
arXiv Detail & Related papers (2025-12-26T07:11:47Z)
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction [53.745458605360675]
We explore world-model internalization through efficient interaction and active reasoning (WMAct)<n>WMAct liberates the model from structured reasoning, allowing the model to shape thinking directly through its doing.<n>Our experiments on Sokoban, Maze, and Taxi show that WMAct yields effective world model reasoning capable of resolving tasks in a single turn.
arXiv Detail & Related papers (2025-11-28T18:59:47Z)
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z)
Learning Time-Aware Causal Representation for Model Generalization in Evolving Domains [50.66049136093248]
We develop a time-aware structural causal model (SCM) that incorporates dynamic causal factors and the causal mechanism drifts.<n>We show that our method can yield the optimal causal predictor for each time domain.<n>Results on both synthetic and real-world datasets exhibit that SYNC can achieve superior temporal generalization performance.
arXiv Detail & Related papers (2025-06-21T14:05:37Z)
On Generalization Across Environments In Multi-Objective Reinforcement Learning [6.686583184622338]
We formalize the concept of generalization in Multi-Objective Reinforcement Learning (MORL) and how it can be evaluated.<n>We contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations.<n>Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement.
arXiv Detail & Related papers (2025-03-02T08:50:14Z)
FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.<n>In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.<n>We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z)
A Multiobjective Reinforcement Learning Framework for Microgrid Energy Management [0.0]
microgrids (MGs) provide a promising solution for decarbonizing and decentralizing the power grid.<n>However, MG operations often involve considering multiple objectives that represent the interests of different stakeholders.<n>We propose a novel multi-objective reinforcement learning framework that explores the high-dimensional objective space and uncovers the tradeoffs between conflicting objectives.
arXiv Detail & Related papers (2023-07-17T17:52:57Z)
Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.