Related papers: Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

URL: http://arxiv.org/abs/2502.08021v1
Date: Tue, 11 Feb 2025 23:40:55 GMT
Title: Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Authors: Pai Liu, Lingfeng Zhao, Shivangi Agarwal, Jinghan Liu, Audrey Huang, Philip Amortila, Nan Jiang,
Abstract summary: We develop a new protocol for evaluating model-free and model-based selectors.<n>We find that our new model-free selector, LSTD-Tournament, demonstrates promising empirical performance.
Score: 12.46794226683161
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either incurs exponential variance (e.g., importance sampling) or has hyperparameters on their own (e.g., FQE and model-based). In this work we focus on hyperparameter tuning for OPE itself, which is even more under-investigated. Concretely, we select among candidate value functions ("model-free") or dynamics ("model-based") to best assess the performance of a target policy. Our contributions are two fold. We develop: (1) new model-free and model-based selectors with theoretical guarantees, and (2) a new experimental protocol for empirically evaluating them. Compared to the model-free protocol in prior works, our new protocol allows for more stable generation of candidate value functions, better control of misspecification, and evaluation of model-free and model-based methods alike. We exemplify the protocol on a Gym environment, and find that our new model-free selector, LSTD-Tournament, demonstrates promising empirical performance.

Related papers

A new approach for combined model class selection and parameters learning for auto-regressive neural models [0.4779196219827507]
This work focuses on a specific Recurrent Neural Networks (RNNs) family, i.e. Auto-Regressive with eXogenous inputs Echo State Networks (XENARSNs)<n>The method allows to simultaneously select the optimal model class and learn model parameters from data.<n>Results show the effectiveness of the approach in identifying parsimonious yet accurate models suitable for control applications.
arXiv Detail & Related papers (2026-01-24T12:26:25Z)
Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models [0.8602553195689513]
Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments.<n>We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process.<n>We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX, in a simulation study.
arXiv Detail & Related papers (2025-04-08T12:33:38Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection [1.4530711901349282]
Test-Time Adaptation (TTA) has emerged as a promising strategy for tackling the problem of machine learning model robustness under distribution shifts. We evaluate existing TTA methods using surrogate-based hp-selection strategies to obtain a more realistic evaluation of their performance.
arXiv Detail & Related papers (2024-07-19T11:58:30Z)
Aligning Large Language Models by On-Policy Self-Judgment [49.31895979525054]
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. We present a novel alignment framework, SELF-JUDGE, that does on-policy learning and is parameter efficient. We show that the rejecting sampling by itself can improve performance further without an additional evaluator.
arXiv Detail & Related papers (2024-02-17T11:25:26Z)
Evaluating Representations with Readout Model Switching [19.907607374144167]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z)
Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem. We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z)
Out-of-sample scoring and automatic selection of causal estimators [0.0]
We propose novel scoring approaches for both the CATE case and an important subset of instrumental variable problems. We implement that in an open source package that relies on DoWhy and EconML libraries.
arXiv Detail & Related papers (2022-12-20T08:29:18Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together. The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z)
Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts. We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z)
Improving Hyperparameter Optimization by Planning Ahead [3.8673630752805432]
We propose a novel transfer learning approach, defined within the context of model-based reinforcement learning. We propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy. Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms show that the proposed method can outperform all baselines.
arXiv Detail & Related papers (2021-10-15T11:46:14Z)
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings [13.376364233897528]
Reinforcement learning can be used to learn treatment policies and aid decision making in healthcare. A standard validation pipeline for model selection requires running a learned policy in the actual environment. Our work serves as a practical guide for offline RL model selection and can help RL practitioners select policies using real-world datasets.
arXiv Detail & Related papers (2021-07-23T02:41:51Z)
COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions. We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.