Fugu-MT 論文翻訳(概要): Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

論文の概要: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

arxiv url: http://arxiv.org/abs/2206.02092v1
Date: Sun, 5 Jun 2022 03:48:42 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-11 12:24:50.648615
Title: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Title（参考訳）: シーケンス最適化のためのバンド理論とトンプソンサンプリング誘導進化
Authors: Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesv\'ari, Mengdi Wang
Abstract要約: 本稿では,シーケンス最適化のためのトンプソンサンプリング誘導指向進化(TS-DE)フレームワークを提案する。 TS-DE は、$tilde O(d2sqrtMT)$, $d$ is feature dimension, $M$ is population size, $T$ is number of rounds のベイズ的後悔を楽しむことを示す。これはより一般的なシーケンス最適化と進化的アルゴリズムに影響を及ぼす可能性がある。
参考スコア（独自算出の注目度）: 38.547378870770956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Directed Evolution (DE), a landmark wet-lab method originated in 1960s, enables discovery of novel protein designs via evolving a population of candidate sequences. Recent advances in biotechnology has made it possible to collect high-throughput data, allowing the use of machine learning to map out a protein's sequence-to-function relation. There is a growing interest in machine learning-assisted DE for accelerating protein optimization. Yet the theoretical understanding of DE, as well as the use of machine learning in DE, remains limited. In this paper, we connect DE with the bandit learning theory and make a first attempt to study regret minimization in DE. We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements. TS-DE updates a posterior of the function based on collected measurements. It uses a posterior-sampled function estimate to guide the crossover recombination and mutation steps in DE. In the case of a linear model, we show that TS-DE enjoys a Bayesian regret of order $\tilde O(d^{2}\sqrt{MT})$, where $d$ is feature dimension, $M$ is population size and $T$ is number of rounds. This regret bound is nearly optimal, confirming that bandit learning can provably accelerate DE. It may have implications for more general sequence optimization and evolutionary algorithms.
Abstract（参考訳）: 1960年代に始まった画期的な湿床法であるdirected evolution (de)は、候補配列の集団を進化させることで、新しいタンパク質の設計を発見できる。近年のバイオテクノロジーの進歩により、高スループットデータの収集が可能となり、機械学習を用いてタンパク質の配列と機能の関係をマッピングできるようになった。タンパク質最適化を加速するための機械学習支援deへの関心が高まっている。しかし、deの理論的な理解と、deでの機械学習の使用は、まだ限られている。本稿では,deをバンディット学習理論と結びつけ,deにおける後悔の最小化を研究する最初の試みを行う。本稿では,シーケンス・トゥ・ファンクション・マッピングが未知であり,単一値のクエリがコストが高くノイズの多い測定対象となるシーケンス最適化のためのトンプソンサンプリング誘導指向進化(TS-DE)フレームワークを提案する。 TS-DEは収集された測定値に基づいて関数の後方を更新する。 DEのクロスオーバー組換えと突然変異ステップを導くのに、後方サンプリング関数推定を用いる。線形モデルの場合、TS-DE は$\tilde O(d^{2}\sqrt{MT})$, $d$ is feature dimension, $M$ is population size, $T$ is number of rounds のベイズ的後悔を楽しむ。この後悔のバウンドはほぼ最適であり、バンディット学習は確実にdeを加速することができる。より一般的なシーケンス最適化や進化的アルゴリズムに影響を及ぼす可能性がある。

関連論文リスト

Efficiently Solving Discounted MDPs with Predictions on Transition Matrices [6.199300239433395]
生成モデルに基づくDMDP(Discounted Markov Decision Processs)について検討した。 DMDPの解法において,遷移行列上での予測がサンプル効率をいかに向上させるかを検討するための新しい枠組みを提案する。
論文参考訳（メタデータ） (2025-02-21T09:59:46Z)
Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
我々は,ラゲールセル推定と密度支持推定の類似性を用いて,OTマップに対して$mathcalO(t-1)$の低いバウンダリレートを証明した。所望の速さをほぼ達成するために,サンプル数に応じて減少するエントロピー正規化スキームを設計する。
論文参考訳（メタデータ） (2024-05-23T11:46:03Z)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models [49.81937966106691]
我々は拡散モデルのデータ生成過程を理解するための非漸近理論のスイートを開発する。従来の研究とは対照的に,本理論は基本的だが多目的な非漸近的アプローチに基づいて開発されている。
論文参考訳（メタデータ） (2023-06-15T16:30:08Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
我々は、強化学習のためのトンプソンサンプリングに基づくスケーラブルで効果的な探索戦略を提案する。代わりに、Langevin Monte Carlo を用いて、Q 関数をその後部分布から直接サンプリングする。提案手法は,Atari57スイートからのいくつかの挑戦的な探索課題において,最先端の深部RLアルゴリズムと比較して,より優れた,あるいは類似した結果が得られる。
論文参考訳（メタデータ） (2023-05-29T17:11:28Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes [80.89852729380425]
そこで本研究では,最小限の最小残差である$tilde O(dsqrtH3K)$を計算効率よく実現したアルゴリズムを提案する。我々の研究は線形 MDP を用いた最適 RL に対する完全な答えを提供する。
論文参考訳（メタデータ） (2022-12-12T18:58:59Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
漸進的強化学習における後悔を最小限に抑えるために,新しいモデルフリーアルゴリズムを提案する。提案アルゴリズムは、2つのQ-ラーニングシーケンスの助けを借りて、初期設定された参照更新ルールを用いる。初期の分散還元法の設計原理は、他のRL設定とは独立した関心を持つかもしれない。
論文参考訳（メタデータ） (2021-10-09T21:13:48Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
人工知能(AI)と機械学習(ML)の分野では、未知のターゲット関数 $y=f(mathbfx)$ の近似が共通の目的である。トレーニングセットとして$S$を参照し、新しいインスタンス$mathbfx$に対して、このターゲット関数を効果的に近似できる低複雑さの数学的モデルを特定することを目的としている。
論文参考訳（メタデータ） (2020-11-27T04:57:40Z)
Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping [99.59319332864129]
本稿では,割引決定(MDP)のための強化学習について検討する。本稿では,特徴写像を利用した新しいアルゴリズムを提案し,$tilde O(dsqrtT/ (1-gamma)2)$ regretを求める。以上の結果から,提案した強化学習アルゴリズムは,最大1-γ-0.5$の係数でほぼ最適であることが示唆された。
論文参考訳（メタデータ） (2020-06-23T17:08:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。