Optimization of Epsilon-Greedy Exploration
- URL: http://arxiv.org/abs/2506.03324v1
- Date: Tue, 03 Jun 2025 19:14:53 GMT
- Title: Optimization of Epsilon-Greedy Exploration
- Authors: Ethan Che, Hakan Ceylan, James McInerney, Nathan Kallus,
- Abstract summary: We show that variations in the batch size across periods significantly influence the optimal exploration strategy.<n>Our methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best for each setting.
- Score: 35.9674180611893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent (SGD), allowing for dynamic exploration rate adjustment via Model-Predictive Control (MPC). Through extensive experiments with recommendation datasets, we demonstrate that variations in the batch size across periods significantly influence the optimal exploration strategy. Our optimization methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best heuristic for each setting.
Related papers
- Where to Explore: A Reach and Cost-Aware Approach for Unbiased Data Collection in Recommender Systems [4.013613514540094]
This paper introduces an approach for delivering content-level exploration safely and efficiently by optimizing its placement based on reach and opportunity cost.<n> Deployed on a large-scale streaming platform with over 100 million monthly active users, our approach identifies scroll-depth regions with lower engagement.<n>Our method complements existing intra-row diversification and bandit-based exploration techniques by introducing a deployable, behaviorally informed mechanism for surfacing exploratory content at scale.
arXiv Detail & Related papers (2025-12-11T04:03:44Z) - Direct Regret Optimization in Bayesian Optimization [10.705151736050967]
We propose a novel direct regret optimization approach that jointly learns the optimal model and non-myopic acquisition.<n>We show that our method consistently outperforms BO baselines, achieving lower simple regret and demonstrating more robust exploration.
arXiv Detail & Related papers (2025-07-09T04:09:58Z) - Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles [41.94295877935867]
Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control.<n>Finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge.<n>We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference.
arXiv Detail & Related papers (2025-03-19T16:47:56Z) - Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation [72.24964965882783]
Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error.<n>Real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies.<n>We introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficiency in RL-based robotic navigation without modifying the reward function.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Autonomous UAV Exploration of Dynamic Environments via Incremental
Sampling and Probabilistic Roadmap [0.3867363075280543]
We propose a novel dynamic exploration planner (DEP) for exploring unknown environments using incremental sampling and Probabilistic Roadmap (PRM)
Our method safely explores dynamic environments and outperforms the benchmark planners in terms of exploration time, path length, and computational time.
arXiv Detail & Related papers (2020-10-14T22:52:37Z) - Localized active learning of Gaussian process state space models [63.97366815968177]
A globally accurate model is not required to achieve good performance in many common control applications.
We propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space.
By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy.
arXiv Detail & Related papers (2020-05-04T05:35:02Z) - Dynamic Subgoal-based Exploration via Bayesian Optimization [7.297146495243708]
Reinforcement learning in sparse-reward navigation environments is challenging and poses a need for effective exploration.
We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies.
An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains.
arXiv Detail & Related papers (2019-10-21T04:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.