Program-Based Strategy Induction for Reinforcement Learning
- URL: http://arxiv.org/abs/2402.16668v1
- Date: Mon, 26 Feb 2024 15:40:46 GMT
- Title: Program-Based Strategy Induction for Reinforcement Learning
- Authors: Carlos G. Correa and Thomas L. Griffiths and Nathaniel D. Daw
- Abstract summary: We use Bayesian program induction to discover strategies implemented by programs, letting the simplicity of strategies trade off against their effectiveness.
We find strategies that are difficult or unexpected with classical incremental learning, like asymmetric learning from rewarded and unrewarded trials, adaptive horizon-dependent random exploration, and discrete state switching.
- Score: 5.657991642023959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typical models of learning assume incremental estimation of
continuously-varying decision variables like expected rewards. However, this
class of models fails to capture more idiosyncratic, discrete heuristics and
strategies that people and animals appear to exhibit. Despite recent advances
in strategy discovery using tools like recurrent networks that generalize the
classic models, the resulting strategies are often onerous to interpret, making
connections to cognition difficult to establish. We use Bayesian program
induction to discover strategies implemented by programs, letting the
simplicity of strategies trade off against their effectiveness. Focusing on
bandit tasks, we find strategies that are difficult or unexpected with
classical incremental learning, like asymmetric learning from rewarded and
unrewarded trials, adaptive horizon-dependent random exploration, and discrete
state switching.
Related papers
- Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms [0.0]
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth.
We introduce a novel framework for decision-making in combining strategies, irrespective of market conditions.
We show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios.
arXiv Detail & Related papers (2024-06-05T23:08:57Z) - Unleashing the Potential of Regularization Strategies in Learning with
Noisy Labels [65.92994348757743]
We demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies can outperform state-of-the-art methods.
Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels.
arXiv Detail & Related papers (2023-07-11T05:58:20Z) - Transfer and Active Learning for Dissonance Detection: Addressing the
Rare-Class Challenge [7.61140479230184]
We propose and investigate transfer- and active learning solutions to the rare class problem of dissonance detection.
We perform experiments for a specific rare class problem: collecting language samples of cognitive dissonance from social media.
We find that probability-of-rare-class (PRC) is a simple and effective strategy to guide annotations and ultimately improve model accuracy.
arXiv Detail & Related papers (2023-05-03T23:29:05Z) - Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z) - Strategy Synthesis in Markov Decision Processes Under Limited Sampling
Access [3.441021278275805]
In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the synthesiss involved.
In this paper, we devise a strategy algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model.
arXiv Detail & Related papers (2023-03-22T16:58:44Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - Strategic Classification Made Practical [8.778578967271866]
We present a learning framework for strategic classification that is practical.
Our approach directly minimizes the "strategic" empirical risk, achieved by differentiating through the strategic response of users.
A series of experiments demonstrates the effectiveness of our approach on various learning settings.
arXiv Detail & Related papers (2021-03-02T16:03:26Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.