Fugu-MT 論文翻訳(概要): GIANTS: Generative Insight Anticipation from Scientific Literature

論文の概要: GIANTS: Generative Insight Anticipation from Scientific Literature

arxiv url: http://arxiv.org/abs/2604.09793v1
Date: Fri, 10 Apr 2026 18:13:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.676619
Title: GIANTS: Generative Insight Anticipation from Scientific Literature
Title（参考訳）: GIANTS: 科学文献からの創発的洞察
Authors: Joy He-Yueya, Anikait Singh, Ge Gao, Michael Y. Li, Sherry Yang, Chelsea Finn, Emma Brunskill, Noah D. Goodman,
Abstract要約: 本稿では、下流紙のコアインサイトを基礎となる親論文から予測する世代課題であるインサイト予測を導入する。実測値と実測値の類似性を評価するLM判定器を用いてモデル評価を行い,これらの類似性スコアが有能な人間の評価値と相関していることを示す。 GIANTS-4Bは、強化学習(RL)を用いて訓練されたLMで、これらの類似度スコアをプロキシ報酬として用いた洞察予測を最適化する。
参考スコア（独自算出の注目度）: 84.95947892931142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scientific breakthroughs often emerge from synthesizing prior ideas into novel contributions. While language models (LMs) show promise in scientific discovery, their ability to perform this targeted, literature-grounded synthesis remains underexplored. We introduce insight anticipation, a generation task in which a model predicts a downstream paper's core insight from its foundational parent papers. To evaluate this capability, we develop GiantsBench, a benchmark of 17k examples across eight scientific domains, where each example consists of a set of parent papers paired with the core insight of a downstream paper. We evaluate models using an LM judge that scores similarity between generated and ground-truth insights, and show that these similarity scores correlate with expert human ratings. Finally, we present GIANTS-4B, an LM trained via reinforcement learning (RL) to optimize insight anticipation using these similarity scores as a proxy reward. Despite its smaller open-source architecture, GIANTS-4B outperforms proprietary baselines and generalizes to unseen domains, achieving a 34% relative improvement in similarity score over gemini-3-pro. Human evaluations further show that GIANTS-4B produces insights that are more conceptually clear than those of the base model. In addition, SciJudge-30B, a third-party model trained to compare research abstracts by likely citation impact, predicts that insights generated by GIANTS-4B are more likely to lead to higher citations, preferring them over the base model in 68% of pairwise comparisons. We release our code, benchmark, and model to support future research in automated scientific discovery.
Abstract（参考訳）: 科学的なブレークスルーは、しばしば以前のアイデアを新しいコントリビューションに合成することから生じる。言語モデル(LM)は科学的発見において有望であることを示しているが、この目的の合成を行う能力は未解明のままである。本稿では、下流紙のコアインサイトを基礎となる親論文から予測する世代課題であるインサイト予測を導入する。この能力を評価するために,8つの科学領域にわたる17kサンプルのベンチマークであるGiantsBenchを開発し,各サンプルは,下流紙のコアインサイトと組み合わせた親論文の集合からなる。実測値と実測値の類似性を評価するLM判定器を用いてモデル評価を行い,これらの類似性スコアが有能な人間の評価値と相関していることを示す。最後にGIANTS-4Bを提案する。これは強化学習(RL)を用いて訓練されたLMで、これらの類似度スコアをプロキシ報酬として用いた洞察予測を最適化する。オープンソースアーキテクチャは小さいが、GIANTS-4Bはプロプライエタリなベースラインを上回り、目に見えないドメインに一般化し、gemini-3-proよりも34%の類似性スコアを達成している。人間による評価は、GIANTS-4Bがベースモデルよりも概念的に明確な洞察を生み出すことを示している。さらに、SciJudge-30Bという第三者モデルでは、GIANTS-4Bが生み出した洞察がより高い引用につながる可能性が高く、対比較の68%でベースモデルよりもそれらを好むと予測している。私たちは、自動科学的発見における将来の研究を支援するために、コード、ベンチマーク、モデルをリリースします。

論文の概要: GIANTS: Generative Insight Anticipation from Scientific Literature

関連論文リスト