mlOSP: Towards a Unified Implementation of Regression Monte Carlo
Algorithms
- URL: http://arxiv.org/abs/2012.00729v1
- Date: Tue, 1 Dec 2020 18:41:02 GMT
- Title: mlOSP: Towards a Unified Implementation of Regression Monte Carlo
Algorithms
- Authors: Mike Ludkovski
- Abstract summary: We introduce mlOSP, a computational template for Machine Learning for Optimal Stopping Problems.
The template is implemented in the R statistical environment and publicly available via a GitHub repository.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce mlOSP, a computational template for Machine Learning for Optimal
Stopping Problems. The template is implemented in the R statistical environment
and publicly available via a GitHub repository. mlOSP presents a unified
numerical implementation of Regression Monte Carlo (RMC) approaches to optimal
stopping, providing a state-of-the-art, open-source, reproducible and
transparent platform. Highlighting its modular nature, we present multiple
novel variants of RMC algorithms, especially in terms of constructing
simulation designs for training the regressors, as well as in terms of machine
learning regression modules. At the same time, mlOSP nests most of the existing
RMC schemes, allowing for a consistent and verifiable benchmarking of extant
algorithms. The article contains extensive R code snippets and figures, and
serves the dual role of presenting new RMC features and as a vignette to the
underlying software package.
Related papers
- Chart2Code-MoLA: Efficient Multi-Modal Code Generation via Adaptive Expert Routing [20.521717930460692]
C2C-MoLA is a framework that synergizes Mixture of Experts (MoE) with Low-Rank Adaptation (LoRA)<n>LoRA enables parameter-efficient updates for resource-conscious tuning.<n>Experiments on Chart2Code-160k show that the proposed model improves generation accuracy by up to 17%.
arXiv Detail & Related papers (2025-11-28T16:23:04Z) - Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models [0.0]
We show that, compared with the slow inference achieved with existing program libraries, the performance can be drastically improved.<n>We demonstrate that RMHMC effectively samples from the posterior, allowing the calculation of model evidence.<n>We highlight the need to develop a customisable library set that allows users to incorporate dynamically programmed objects.
arXiv Detail & Related papers (2025-11-09T14:44:13Z) - AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators [3.1594665317979698]
We explore AI-driven distributed-systems policy design by combining code generation from large language models with deterministic verification in a domain-specific simulator.<n>We report preliminary results on throughput improvements across multiple models.<n>We conjecture that AI will be crucial for scaling this methodology by helping to bootstrap new simulators.
arXiv Detail & Related papers (2025-10-20T16:10:24Z) - FBMS: An R Package for Flexible Bayesian Model Selection and Model Averaging [14.487258585834374]
The FBMS package implements an efficient Mode Jumping Markov Chain Monte Carlo (MJMCMC) algorithm.<n>Within this framework, the algorithm maintains and updates populations of transformed features, computes their posterior probabilities, and evaluates the posteriors of models constructed from them.<n>We demonstrate the effective use of FBMS for both inferential and predictive modeling in Gaussian regression, focusing on different instances of the BGNLM class of models.
arXiv Detail & Related papers (2025-08-31T09:04:01Z) - Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.
We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)
Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.
By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Performance-Weighed Policy Sampling for Meta-Reinforcement Learning [1.77898701462905]
Enhanced Model-Agnostic Meta-Learning (E-MAML) generates fast convergence of the policy function from a small number of training examples.
E-MAML maintains a set of policy parameters learned in the environment for previous tasks.
We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes.
arXiv Detail & Related papers (2020-12-10T23:08:38Z) - On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement
Learning [25.163423936635787]
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems.
We propose a variant of the MAML method, named Gradient Meta-Reinforcement Learning (SG-MRL)
We derive the iteration and sample complexity of SG-MRL to find an $ilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-02-12T18:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.