Bandit Theory and Thompson Sampling-Guided Directed Evolution for
Sequence Optimization
- URL: http://arxiv.org/abs/2206.02092v1
- Date: Sun, 5 Jun 2022 03:48:42 GMT
- Title: Bandit Theory and Thompson Sampling-Guided Directed Evolution for
Sequence Optimization
- Authors: Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba
Szepesv\'ari, Mengdi Wang
- Abstract summary: We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization.
We show that TS-DE enjoys a Bayesian regret of order $tilde O(d2sqrtMT)$, where $d$ is feature dimension, $M$ is population size and $T$ is number of rounds.
It may have implications for more general sequence optimization and evolutionary algorithms.
- Score: 38.547378870770956
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Directed Evolution (DE), a landmark wet-lab method originated in 1960s,
enables discovery of novel protein designs via evolving a population of
candidate sequences. Recent advances in biotechnology has made it possible to
collect high-throughput data, allowing the use of machine learning to map out a
protein's sequence-to-function relation. There is a growing interest in machine
learning-assisted DE for accelerating protein optimization. Yet the theoretical
understanding of DE, as well as the use of machine learning in DE, remains
limited. In this paper, we connect DE with the bandit learning theory and make
a first attempt to study regret minimization in DE. We propose a Thompson
Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization,
where the sequence-to-function mapping is unknown and querying a single value
is subject to costly and noisy measurements. TS-DE updates a posterior of the
function based on collected measurements. It uses a posterior-sampled function
estimate to guide the crossover recombination and mutation steps in DE. In the
case of a linear model, we show that TS-DE enjoys a Bayesian regret of order
$\tilde O(d^{2}\sqrt{MT})$, where $d$ is feature dimension, $M$ is population
size and $T$ is number of rounds. This regret bound is nearly optimal,
confirming that bandit learning can provably accelerate DE. It may have
implications for more general sequence optimization and evolutionary
algorithms.
Related papers
- Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes [0.0]
We analyze the gradient descent (SGD), a widely used method in supervised learning.
We exploit two different: duality methods entropy and dynamics.
arXiv Detail & Related papers (2025-01-14T20:33:30Z) - Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation.
To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Provably Efficient Reinforcement Learning for Discounted MDPs with
Feature Mapping [99.59319332864129]
In this paper, we study reinforcement learning for discounted Decision (MDP)
We propose a novel algorithm that makes use of the feature mapping and obtains a $tilde O(dsqrtT/ (1-gamma)2)$ regret.
Our upper and lower bound results together suggest that the proposed reinforcement learning algorithm is near-optimal up to a $ (1-gamma)-0.5$ factor.
arXiv Detail & Related papers (2020-06-23T17:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.