mlOSP: Towards a Unified Implementation of Regression Monte Carlo
Algorithms
- URL: http://arxiv.org/abs/2012.00729v1
- Date: Tue, 1 Dec 2020 18:41:02 GMT
- Title: mlOSP: Towards a Unified Implementation of Regression Monte Carlo
Algorithms
- Authors: Mike Ludkovski
- Abstract summary: We introduce mlOSP, a computational template for Machine Learning for Optimal Stopping Problems.
The template is implemented in the R statistical environment and publicly available via a GitHub repository.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce mlOSP, a computational template for Machine Learning for Optimal
Stopping Problems. The template is implemented in the R statistical environment
and publicly available via a GitHub repository. mlOSP presents a unified
numerical implementation of Regression Monte Carlo (RMC) approaches to optimal
stopping, providing a state-of-the-art, open-source, reproducible and
transparent platform. Highlighting its modular nature, we present multiple
novel variants of RMC algorithms, especially in terms of constructing
simulation designs for training the regressors, as well as in terms of machine
learning regression modules. At the same time, mlOSP nests most of the existing
RMC schemes, allowing for a consistent and verifiable benchmarking of extant
algorithms. The article contains extensive R code snippets and figures, and
serves the dual role of presenting new RMC features and as a vignette to the
underlying software package.
Related papers
- Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.
We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)
Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.
By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Performance-Weighed Policy Sampling for Meta-Reinforcement Learning [1.77898701462905]
Enhanced Model-Agnostic Meta-Learning (E-MAML) generates fast convergence of the policy function from a small number of training examples.
E-MAML maintains a set of policy parameters learned in the environment for previous tasks.
We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes.
arXiv Detail & Related papers (2020-12-10T23:08:38Z) - On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement
Learning [25.163423936635787]
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems.
We propose a variant of the MAML method, named Gradient Meta-Reinforcement Learning (SG-MRL)
We derive the iteration and sample complexity of SG-MRL to find an $ilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-02-12T18:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.