Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment
- URL: http://arxiv.org/abs/2106.14993v1
- Date: Mon, 28 Jun 2021 21:29:13 GMT
- Title: Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment
- Authors: Michael Chang, Sidhant Kaushik, Sergey Levine, Thomas L. Griffiths
- Abstract summary: We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
- Score: 79.5678820246642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many transfer problems require re-using previously optimal decisions for
solving new tasks, which suggests the need for learning algorithms that can
modify the mechanisms for choosing certain actions independently of those for
choosing others. However, there is currently no formalism nor theory for how to
achieve this kind of modular credit assignment. To answer this question, we
define modular credit assignment as a constraint on minimizing the algorithmic
mutual information among feedback signals for different decisions. We introduce
what we call the modularity criterion for testing whether a learning algorithm
satisfies this constraint by performing causal analysis on the algorithm
itself. We generalize the recently proposed societal decision-making framework
as a more granular formalism than the Markov decision process to prove that for
decision sequences that do not contain cycles, certain single-step temporal
difference action-value methods meet this criterion while all policy-gradient
methods do not. Empirical evidence suggests that such action-value methods are
more sample efficient than policy-gradient methods on transfer problems that
require only sparse changes to a sequence of previously optimal decisions.
Related papers
- Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning.
By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives.
Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z) - Interpretable Decision Tree Search as a Markov Decision Process [8.530182510074983]
Finding an optimal decision tree for a supervised learning task is a challenging problem to solve at scale.
It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling.
We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that generates for every state.
arXiv Detail & Related papers (2023-09-22T08:18:08Z) - Domain Generalization via Rationale Invariance [70.32415695574555]
This paper offers a new perspective to ease the challenge of domain generalization, which involves maintaining robust results even in unseen environments.
We propose treating the element-wise contributions to the final results as the rationale for making a decision and representing the rationale for each sample as a matrix.
Our experiments demonstrate that the proposed approach achieves competitive results across various datasets, despite its simplicity.
arXiv Detail & Related papers (2023-08-22T03:31:40Z) - Sequential Knockoffs for Variable Selection in Reinforcement Learning [19.925653053430395]
We introduce the notion of a minimal sufficient state in a Markov decision process (MDP)
We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics.
arXiv Detail & Related papers (2023-03-24T21:39:06Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Counterfactual Explanations in Sequential Decision Making Under
Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes.
In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions.
We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.