Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access
- URL: http://arxiv.org/abs/2511.10291v1
- Date: Fri, 14 Nov 2025 01:44:01 GMT
- Title: Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access
- Authors: Aswin Arun, Christo Kurisummoottil Thomas, Rimalpudi Sarvendranath, Walid Saad,
- Abstract summary: Multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC) is hindered by their sample inefficiency.<n>In this paper, a novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing.<n>The proposed approach can reduce environment interactions by 58%, and yield faster convergence compared to model-free baselines.
- Score: 39.76683291751265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hindered by their sample inefficiency. To alleviate this challenge, one can leverage model-based reinforcement learning (MBRL) solutions, however, conventional MBRL approaches rely on black-box models that are not interpretable and cannot reason. In contrast, in this paper, a novel causal model-based MARL framework is developed by leveraging tools from causal learn- ing. In particular, the proposed model can explicitly represent causal dependencies between network variables using structural causal models (SCMs) and attention-based inference networks. Interpretable causal models are then developed to capture how MAC control messages influence observations, how transmission actions determine outcomes, and how channel observations affect rewards. Data augmentation techniques are then used to generate synthetic rollouts using the learned causal model for policy optimization via proximal policy optimization (PPO). Analytical results demonstrate exponential sample complexity gains of causal MBRL over black-box approaches. Extensive simulations demonstrate that, on average, the proposed approach can reduce environment interactions by 58%, and yield faster convergence compared to model-free baselines. The proposed approach inherently is also shown to provide interpretable scheduling decisions via attention-based causal attribution, revealing which network conditions drive the policy. The resulting combination of sample efficiency and interpretability establishes causal MBRL as a practical approach for resource-constrained wireless systems.
Related papers
- Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z) - Efficient Solution and Learning of Robust Factored MDPs [57.2416302384766]
Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with provable guarantees on performance.<n>We propose novel methods for solving and learning r-MDPs based on factored state representations.
arXiv Detail & Related papers (2025-08-01T15:23:15Z) - Towards Causal Model-Based Policy Optimization [0.24578723416255752]
We introduce Causal Model-Based Policy Optimization (C-MBPO)<n>C-MBPO is a novel framework that integrates causal learning into the Model-Based Reinforcement Learning pipeline.<n>We show that C-MBPO can be shown to be robust to a class of distributional shifts that affect spurious, non-causal relationships in the dynamics.
arXiv Detail & Related papers (2025-03-12T18:09:02Z) - BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning [39.090104460303415]
offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies.<n>This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data.<n>We introduce textbfBilintextbfEar textbfCAUSal rtextbfEpresentation(BECAUSE), an algorithm to capture causal representation for both states.
arXiv Detail & Related papers (2024-07-15T17:59:23Z) - Model-based Offline Policy Optimization with Adversarial Network [0.36868085124383626]
We propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN)
Key idea is to use adversarial learning to build a transition model with better generalization.
Our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks.
arXiv Detail & Related papers (2023-09-05T11:49:33Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Causal Disentangled Variational Auto-Encoder for Preference
Understanding in Recommendation [50.93536377097659]
This paper introduces the Causal Disentangled Variational Auto-Encoder (CaD-VAE), a novel approach for learning causal disentangled representations from interaction data in recommender systems.
The approach utilizes structural causal models to generate causal representations that describe the causal relationship between latent factors.
arXiv Detail & Related papers (2023-04-17T00:10:56Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.