Fugu-MT 論文翻訳(概要): Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

論文の概要: Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

arxiv url: http://arxiv.org/abs/2606.01042v1
Date: Sun, 31 May 2026 06:13:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.16174
Title: Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning
Title（参考訳）: 可塑性は予測できない:LLMに基づく細胞摂動推論の対照的な証拠
Authors: Xinyu Yuan, Xixian Liu, Jianan Zhao, Yashi Zhang, Hongyu Guo, Jian Tang,
Abstract要約: 摂動実験は細胞機構の理解の中心である。それらは高価で疎いままであり、保存されていない状態に対する遺伝子発現応答の予測を動機付けている。証拠を関連摂動から肯定的・否定的な結果に整理することで,予測を比較課題として再編成するCOREを導入する。
参考スコア（独自算出の注目度）: 27.902219493068824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perturbation experiments are central to understanding cellular mechanisms, but remain costly and sparse, motivating prediction of gene expression responses for unobserved conditions. A promising recent direction leverages large language models (LLMs) as "virtual cell" simulators-using stepwise, knowledge-grounded mechanistic reasoning to infer differential expression-pointing toward an interpretable, knowledge-driven paradigm that transcends purely data-driven approaches. However, we find that plausibility is not prediction: despite producing biologically plausible explanations, these methods fail to capture perturbation-specific effects: systematically overestimating differential expression, often underperforming a simple gene-frequency baseline in aggregate evaluations, and collapsing to chance-level performance at the per-gene level. This reveals a reliance on intrinsic gene response tendencies rather than true perturbation reasoning. We trace this failure to how evidence is presented: existing methods evaluate perturbation-gene pairs in isolation, without exposing how related perturbations differ in their effects on the same gene. To address this limitation, we introduce CORE (Contrastive Organization of Relational Evidence), which reframes prediction as a comparison task by organizing evidence into positive and negative outcomes from related perturbations. Using a biomedical knowledge graph for evidence retrieval, CORE improves calibration and substantially boosts perturbation-specific prediction in both LLM-based and non-LLM settings: for example, on drug-perturbation data, CORE-Reasoning improves Qwen3.5-9B aggregate metrics by up to 28.6%, while on generic perturbation data, CORE-Voting raises macro-per-gene AUROC from chance to 0.703 in average across four cell lines. This highlights contrastive evidence organization as essential to reliable LLM-based perturbation reasoning
Abstract（参考訳）: 摂動実験は細胞機構の理解の中心であるが、高価で疎外であり、保存されていない条件下での遺伝子発現応答の予測を動機付けている。最近の有望な方向は、大きな言語モデル(LLM)を「仮想セル」シミュレータとして利用し、純粋にデータ駆動アプローチを超越した解釈可能な知識駆動パラダイムに向けて、微分表現ポイントを推論する、段階的に知識基盤の機械的推論を行う。しかし, 生物学的に妥当な説明が得られても, これらの手法は摂動特異的な効果を捉えることができず, 系統的過度な差分表現, 集約評価における単純な遺伝子頻度ベースラインの過小評価, 遺伝子レベルでの確率レベルのパフォーマンスの低下などである。このことは、真の摂動推論よりも本質的な遺伝子応答の傾向に依存していることを明らかにする。既存の方法では、同じ遺伝子に対して、関連する摂動がどのように異なるかを明らかにすることなく、単独で摂動遺伝子対を評価する。この制限に対処するため、我々はCORE(Contrastive Organization of Relational Evidence)を導入し、これは証拠を関連する摂動から肯定的および否定的な結果に整理することで、比較課題として予測を再編成する。例えば、薬物摂動データでは、CORE-ReasoningはQwen3.5-9Bの集計値を最大28.6%改善し、一般的な摂動データでは、CORE-Votingは4つの細胞ラインで平均0.703までマクロ遺伝子AUROCを上昇させる。このことはLLMに基づく摂動推論に不可欠な対照的なエビデンス組織を強調している

論文の概要: Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

関連論文リスト