Related papers: Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

URL: http://arxiv.org/abs/2106.14993v1
Date: Mon, 28 Jun 2021 21:29:13 GMT
Title: Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment
Authors: Michael Chang, Sidhant Kaushik, Sergey Levine, Thomas L. Griffiths
Abstract summary: We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
Score: 79.5678820246642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many transfer problems require re-using previously optimal decisions for solving new tasks, which suggests the need for learning algorithms that can modify the mechanisms for choosing certain actions independently of those for choosing others. However, there is currently no formalism nor theory for how to achieve this kind of modular credit assignment. To answer this question, we define modular credit assignment as a constraint on minimizing the algorithmic mutual information among feedback signals for different decisions. We introduce what we call the modularity criterion for testing whether a learning algorithm satisfies this constraint by performing causal analysis on the algorithm itself. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process to prove that for decision sequences that do not contain cycles, certain single-step temporal difference action-value methods meet this criterion while all policy-gradient methods do not. Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.

Related papers

Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives. Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z)
Online POMDP Planning with Anytime Deterministic Optimality Guarantees [9.444784653236157]
We derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution. We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
Interpretable Decision Tree Search as a Markov Decision Process [8.530182510074983]
Finding an optimal decision tree for a supervised learning task is a challenging problem to solve at scale. It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling. We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that generates for every state.
arXiv Detail & Related papers (2023-09-22T08:18:08Z)
Domain Generalization via Rationale Invariance [70.32415695574555]
This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments. We propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix. Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity.
arXiv Detail & Related papers (2023-08-22T03:31:40Z)
Sequential Knockoffs for Variable Selection in Reinforcement Learning [19.925653053430395]
We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics.
arXiv Detail & Related papers (2023-03-24T21:39:06Z)
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Counterfactual Explanations in Sequential Decision Making Under Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions. We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z)
Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z)
Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions. We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.