Fugu-MT 論文翻訳(概要): How Post-Training Shapes Biological Reasoning Models

論文の概要: How Post-Training Shapes Biological Reasoning Models

arxiv url: http://arxiv.org/abs/2606.16517v1
Date: Mon, 15 Jun 2026 10:19:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.449113
Title: How Post-Training Shapes Biological Reasoning Models
Title（参考訳）: 実験後の形状が生物共振モデルにどのように影響するか
Authors: Lukas Fesser, Hanlin Zhang, Michelle M. Li, Eric Wang, Bryan Perozzi, Shekoofeh Azizi, Sham M. Kakade, Marinka Zitnik,
Abstract要約: 本研究は, バックボーンの制御変化, 事前トレーニング, 教師付き微調整, 強化学習における100以上の生物学的推論モデルを訓練し, 評価した。各ポストトレーニング段階は、一様ゲインに寄与するのではなく、別の方法で一般化を期待できる。
参考スコア（独自算出の注目度）: 50.53183971442794
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scientific reasoning models for biology combine language models with foundation models trained on multimodal biological data, including DNA, RNA, and proteins. These models are built through post-training, yet how each stage shapes reasoning and generalization remains poorly understood. We study when post-training improves performance and when it induces over-specialization. Across genomics, transcriptomics, and proteins, we train and evaluate more than 100 biological reasoning models under controlled variation in backbone, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), measuring both in-domain (ID) and out-of-domain (OOD) performance. We find that each post-training stage reshapes generalization in a distinct way rather than contributing uniform gains. CPT improves downstream performance by aligning models with biological language. SFT consistently increases ID performance but causes OOD performance to peak early and decline as models fit the training distribution. RL, when applied to strong SFT checkpoints with aligned rewards, improves OOD performance and partially recovers generalization. These results show that biological reasoning does not improve monotonically with additional supervision or compute. Instead, performance depends on how training stages are composed. Under fixed post-training budgets, the strongest ID-OOD trade-off comes from brief SFT, larger RL allocations, and asymmetric adaptation capacity across stages.
Abstract（参考訳）: 生物学の科学的推論モデルは、言語モデルとDNA、RNA、タンパク質を含む多モーダルな生物学的データに基づいて訓練された基礎モデルを組み合わせる。これらのモデルはポストトレーニングによって構築されるが、各段階が推論や一般化をどのように形成するかは理解されていない。本研究では,ポストトレーニングがパフォーマンスを向上し,オーバースペシャライゼーションを誘発する時期について検討する。ゲノム学, 転写学, タンパク質全体にわたって, バックボーンの制御変化, CPT, 教師付き微調整(SFT), 強化学習(RL), ドメイン内(ID)とドメイン外(OOD)の両方のパフォーマンスを計測し, 100以上の生物学的推論モデルを訓練し, 評価した。各ポストトレーニング段階は、一様ゲインに寄与するのではなく、別の方法で一般化を期待できる。 CPTは、モデルと生物学的言語を整合させることで、下流のパフォーマンスを改善する。 SFTはID性能を継続的に向上するが、トレーニング分布に適合するモデルによってOOD性能は早期にピークに達し、低下する。 RLは、強いSFTチェックポイントにアライメントされた報酬で適用すると、OOD性能を改善し、部分的に一般化を回復する。これらの結果は、生物学的推論は、追加の監督や計算によって単調に改善しないことを示している。代わりに、パフォーマンスはトレーニングステージの構成方法によって異なります。一定の訓練後予算の下では、最も強力なID-OODトレードオフは、短いSFT、より大きなRL割り当て、ステージ間の非対称適応能力である。

論文の概要: How Post-Training Shapes Biological Reasoning Models

関連論文リスト