Causally Aligned Curriculum Learning
- URL: http://arxiv.org/abs/2503.16799v1
- Date: Fri, 21 Mar 2025 02:20:38 GMT
- Title: Causally Aligned Curriculum Learning
- Authors: Mingxuan Li, Junzhe Zhang, Elias Bareinboim,
- Abstract summary: This paper studies the problem of curriculum RL through causal lenses.<n>We derive a sufficient graphical condition characterizing causally aligned source tasks.<n>We develop an efficient algorithm to generate a causally aligned curriculum.
- Score: 69.11672390876763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A pervasive challenge in Reinforcement Learning (RL) is the "curse of dimensionality" which is the exponential growth in the state-action space when optimizing a high-dimensional target task. The framework of curriculum learning trains the agent in a curriculum composed of a sequence of related and more manageable source tasks. The expectation is that when some optimal decision rules are shared across source tasks and the target task, the agent could more quickly pick up the necessary skills to behave optimally in the environment, thus accelerating the learning process. However, this critical assumption of invariant optimal decision rules does not necessarily hold in many practical applications, specifically when the underlying environment contains unobserved confounders. This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks, i.e., the invariance of optimal decision rules holds. We further develop an efficient algorithm to generate a causally aligned curriculum, provided with qualitative causal knowledge of the target task. Finally, we validate our proposed methodology through experiments in discrete and continuous confounded tasks with pixel observations.
Related papers
- Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments [78.15330971155778]
Posterior and Diversity Synergized Task Sampling (PDTS) is an easy-to-implement method to accommodate fast and robust sequential decision-making.
PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios.
arXiv Detail & Related papers (2025-04-27T07:27:17Z) - Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Proximal Curriculum with Task Correlations for Deep Reinforcement Learning [25.10619062353793]
We consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks.
We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations.
arXiv Detail & Related papers (2024-05-03T21:07:54Z) - Outcome-directed Reinforcement Learning by Uncertainty & Temporal
Distance-Aware Curriculum Goal Generation [29.155620517531656]
Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed.
We propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem.
It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods.
arXiv Detail & Related papers (2023-01-27T14:25:04Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Curriculum Reinforcement Learning using Optimal Transport via Gradual
Domain Adaptation [46.103426976842336]
Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks.
In this work, we focus on the idea of framing CRL as Curriculums between a source (auxiliary) and a target task distribution.
Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts.
arXiv Detail & Related papers (2022-10-18T22:33:33Z) - Task-Optimal Exploration in Linear Dynamical Systems [29.552894877883883]
We study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a task.
We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest.
We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity.
arXiv Detail & Related papers (2021-02-10T01:42:22Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.