Fugu-MT 論文翻訳(概要): Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

論文の概要: Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

arxiv url: http://arxiv.org/abs/2606.08147v1
Date: Sat, 06 Jun 2026 12:56:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.871607
Title: Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction
Title（参考訳）: 生物学的推論インフォームドレグレッションによるDNA活性予測の解釈
Authors: Yi Duan, Zhao Yang, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su,
Abstract要約: 既存の方法は通常、ブラックボックス方式でシーケンスからアクティビティスコアを回帰する。 R3LMはLLMの推論インフォームドレグレッションを制御DNA上で教えるフレームワークである。 R3LMは3種類の細胞にわたるエンハンサー予測において最先端の性能を達成する。
参考スコア（独自算出の注目度）: 15.79385231366071
License: http://creativecommons.org/licenses/by/4.0/
Abstract: DNA cis-regulatory elements (CREs) such as enhancers control gene expression levels. Accurately predicting regulatory activity from DNA sequences is valuable but challenging, as it requires understanding complex biological regulatory processes. Existing methods typically regress activity scores from sequences in a black-box manner, limiting both interpretability and regression performance. Meanwhile, large language models (LLMs) benefit from explicit reasoning processes, yet directly applying LLMs to raw DNA sequences performs poorly. In this paper, we bridge this gap by introducing R3LM, a framework that teaches LLMs reasoning-informed regression on regulatory DNA through structured biological knowledge. Specifically, we design a biologically grounded data format that structures DNA's regulatory information for improved LLM understanding, and construct CRE-ReasonBench, the first dataset that associates DNA sequences and activity scores with mechanistic reasoning traces. Through two-stage training that first teaches LLMs reasoning over structured biological information then performs regression, R3LM achieves state-of-the-art performance on enhancer prediction across three cell types, outperforming both LLMs with raw sequence input and specialized DNA models while providing interpretable mechanistic explanations. We expect R3LM as an interpretable reward model that can effectively assist biologists in CRE design. Code is available at https://github.com/DuanYi516/R3LM.
Abstract（参考訳）: エンハンサーなどのDNA cis-regulatory element (CRE) は遺伝子発現のレベルを制御する。 DNA配列から正確な制御活性を予測することは価値があるが、複雑な生物学的制御過程を理解する必要があるため困難である。既存の方法は通常、ブラックボックス方式でシーケンスからアクティビティスコアを退避させ、解釈可能性と回帰性能の両方を制限する。一方、大きな言語モデル(LLM)は明示的な推論プロセスの恩恵を受けるが、LLMを生のDNA配列に直接適用するには不十分である。本稿では,このギャップを,構造的生物学的知識を通じてLLMの推論インフォームドレグレッションを制御DNAに教えるフレームワークであるR3LMを導入することによって橋渡しする。具体的には、LLM理解を改善するためにDNAの制御情報を構造化する生物学的基盤データフォーマットを設計し、DNA配列とアクティビティスコアを機械的推論トレースに関連付ける最初のデータセットであるCRE-ReasonBenchを構築した。 R3LMは、構造化された生物学的情報に基づいてLSMを推論し、その後レグレッションを実行する2段階の訓練を通じて、3つの細胞タイプにわたるエンハンサー予測における最先端のパフォーマンスを達成する。 R3LMは、CRE設計における生物学者を効果的に支援できる、解釈可能な報酬モデルとして期待されている。コードはhttps://github.com/DuanYi516/R3LMで公開されている。

論文の概要: Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

関連論文リスト