Program-Based Strategy Induction for Reinforcement Learning
- URL: http://arxiv.org/abs/2402.16668v1
- Date: Mon, 26 Feb 2024 15:40:46 GMT
- Title: Program-Based Strategy Induction for Reinforcement Learning
- Authors: Carlos G. Correa and Thomas L. Griffiths and Nathaniel D. Daw
- Abstract summary: We use Bayesian program induction to discover strategies implemented by programs, letting the simplicity of strategies trade off against their effectiveness.
We find strategies that are difficult or unexpected with classical incremental learning, like asymmetric learning from rewarded and unrewarded trials, adaptive horizon-dependent random exploration, and discrete state switching.
- Score: 5.657991642023959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typical models of learning assume incremental estimation of
continuously-varying decision variables like expected rewards. However, this
class of models fails to capture more idiosyncratic, discrete heuristics and
strategies that people and animals appear to exhibit. Despite recent advances
in strategy discovery using tools like recurrent networks that generalize the
classic models, the resulting strategies are often onerous to interpret, making
connections to cognition difficult to establish. We use Bayesian program
induction to discover strategies implemented by programs, letting the
simplicity of strategies trade off against their effectiveness. Focusing on
bandit tasks, we find strategies that are difficult or unexpected with
classical incremental learning, like asymmetric learning from rewarded and
unrewarded trials, adaptive horizon-dependent random exploration, and discrete
state switching.
Related papers
- Improving Active Learning with a Bayesian Representation of Epistemic Uncertainty [0.0]
A popular strategy for active learning is to specifically target a reduction in epistemic uncertainty.
We show how this combination leads to new active learning strategies that have desirable properties.
In order to demonstrate the efficiency of these strategies in non-trivial settings, we introduce the notion of a possibilistic Gaussian process (GP)
arXiv Detail & Related papers (2024-12-11T09:19:20Z) - Experience-driven discovery of planning strategies [0.9821874476902969]
We show that new planning strategies are discovered through metacognitive reinforcement learning.
When fitted to human data, these models exhibit a slower discovery rate than humans, leaving room for improvement.
arXiv Detail & Related papers (2024-12-04T08:20:03Z) - Sustainable Self-evolution Adversarial Training [51.25767996364584]
We propose a Sustainable Self-Evolution Adversarial Training (SSEAT) framework for adversarial training defense models.
We introduce a continual adversarial defense pipeline to realize learning from various kinds of adversarial examples.
We also propose an adversarial data replay module to better select more diverse and key relearning data.
arXiv Detail & Related papers (2024-12-03T08:41:11Z) - Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms [0.0]
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth.
We introduce a novel framework for decision-making in combining strategies, irrespective of market conditions.
We show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios.
arXiv Detail & Related papers (2024-06-05T23:08:57Z) - Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z) - Strategy Synthesis in Markov Decision Processes Under Limited Sampling
Access [3.441021278275805]
In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the synthesiss involved.
In this paper, we devise a strategy algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model.
arXiv Detail & Related papers (2023-03-22T16:58:44Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.