Related papers: RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

URL: http://arxiv.org/abs/2410.16517v1
Date: Mon, 21 Oct 2024 21:19:49 GMT
Title: RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space
Authors: Jingdi Chen, Hanhan Zhou, Yongsheng Mei, Carlee Joe-Wong, Gina Adam, Nathaniel D. Bastian, Tian Lan,
Abstract summary: We introduce an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent. We also propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning.
Score: 28.273737052758907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).

Related papers

Automated decision-making for dynamic task assignment at scale [0.0]
The Dynamic Task Assignment Problem (DTAP) concerns matching resources to tasks in real time. This work proposes a DRL-based Decision Support System (DSS) for real-world scale DTAPS. The proposed DSS is evaluated on five DTAP instances whose parameters are extracted from real-world logs through process mining.
arXiv Detail & Related papers (2025-04-28T16:08:35Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought [13.034968416139826]
We propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. IDT is inspired by the efficient hierarchical structure of human decision-making. IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods.
arXiv Detail & Related papers (2024-05-31T08:38:25Z)
Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z)
Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL) We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z)
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs [9.587070290189507]
Interpretability of AI models allows for user safety checks to build trust in such AIs. Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. Recent Reinforcement Learning framework has been proposed to explore the space of DTs using deep RL.
arXiv Detail & Related papers (2023-09-23T13:06:20Z)
Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning [0.0]
Interpretability of AI models allows for user safety checks to build trust in these models. Decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. To learn compact trees, a Reinforcement Learning framework has been recently proposed to explore the space of DTs.
arXiv Detail & Related papers (2023-04-11T09:43:23Z)
Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach. In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z)
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly. We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge. OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z)
Generalized Decision Transformer for Offline Hindsight Information Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem. We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z)
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.