Fugu-MT 論文翻訳(概要): Expanding functional protein sequence space using high entropy generative models

論文の概要: Expanding functional protein sequence space using high entropy generative models

arxiv url: http://arxiv.org/abs/2605.03578v1
Date: Tue, 05 May 2026 09:45:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.882781
Title: Expanding functional protein sequence space using high entropy generative models
Title（参考訳）: 高エントロピー生成モデルを用いた機能的タンパク質配列空間の拡張
Authors: Roberto Netti, Emily Hinds, Francesco Calvanese, Rama Ranganathan, Martin Weigt, Francesco Zamponi,
Abstract要約: 進化的シーケンスデータに基づいて訓練されたボルツマンマシンは、人工タンパク質のデータ駆動設計の強力なパラダイムとして登場した。本稿では,モデルアーキテクチャ,特にパラメータ密度と実験性能の関係について検討する。
参考スコア（独自算出の注目度）: 0.23090185577016445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Boltzmann Machines trained on evolutionary sequence data have emerged as a powerful paradigm for the data-driven design of artificial proteins. However, the relationship between model architecture, specifically parameter density, and experimental performance remains poorly understood. Here, we investigate this relationship using the Chorismate Mutase enzyme family as a model system. We compare standard fully connected Boltzmann Machines for Direct Coupling Analysis (bmDCA) with sparse models generated via progressive edge activation (eaDCA) and edge decimation (edDCA). We identify a maximum-entropy model (meDCA) along the decimation trajectory that represents an optimal balance between constraint satisfaction and the flexibility of the probability distribution. We synthesized and tested artificial sequences from all models using an in vivo complementation assay, finding that all architectures, regardless of sparsity, generate functional enzymes with high success rates, even at significant divergence from natural sequences. Despite this functional equivalence, we demonstrate that the meDCA model samples a viable sequence space that is more than fifteen orders of magnitude larger than its low-entropy counterparts. Furthermore, comparative analyses reveal that high-entropy models systematically minimize overfitting and better capture the local neutral spaces surrounding natural proteins. These findings suggest that while various models satisfying coevolutionary statistics can generate functional sequences, high-entropy Boltzmann Machines provide a superior representation of the underlying evolutionary fitness landscape.
Abstract（参考訳）: 進化的シーケンスデータに基づいて訓練されたボルツマンマシンは、人工タンパク質のデータ駆動設計の強力なパラダイムとして登場した。しかし,モデルアーキテクチャ,特にパラメータ密度と実験性能の関係はよく分かっていない。そこで本研究では,Chorismate Mutase酵素ファミリーをモデルシステムとして,この関係について検討する。完全連結Boltzmann Machines for Direct Coupling Analysis (bmDCA)とプログレッシブエッジアクティベーション(eaDCA)とエッジデシメーション(edDCA)によるスパースモデルを比較した。我々は,制約満足度と確率分布の柔軟性の最適バランスを表すデシメーション軌道に沿った最大エントロピーモデル(meDCA)を同定する。生体内補体法を用いて全モデルから人工的配列を合成, 試験し, 自然配列から大きく分岐しても, 疎性によらず, 高い成功率で機能酵素を生成できることを調べた。この関数同値性にもかかわらず、meDCAモデルは、その低エントロピーモデルよりも15桁以上大きい実効的なシーケンス空間をサンプリングすることを示した。さらに、比較分析により、高エントロピーモデルは、過剰適合を体系的に最小化し、自然タンパク質を取り巻く局所的な中立空間をよりよく捉えていることが明らかとなった。これらの結果は、共進化統計を満足する様々なモデルが機能的シーケンスを生成できる一方で、高エントロピーのボルツマンマシンは、基礎となる進化的フィットネス環境の優れた表現を提供することを示している。

論文の概要: Expanding functional protein sequence space using high entropy generative models

関連論文リスト