Fugu-MT 論文翻訳(概要): SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery

論文の概要: SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery

arxiv url: http://arxiv.org/abs/2509.08032v1
Date: Tue, 09 Sep 2025 16:09:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.214312
Title: SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery
Title（参考訳）: SciGPT:科学文献理解と知識発見のための大規模言語モデル
Authors: Fengyu She, Nan Wang, Hongfei Wu, Ziyi Wan, Jingmian Wang, Chang Wang,
Abstract要約: 本稿では、科学文献理解のためのドメイン適応モデルSciGPTと、科学的LLMを評価するためのオープンソースベンチマークSciGPTを提案する。 Qwen3アーキテクチャ上に構築されたSciGPTには、パフォーマンスと効率のバランスをとるために、(1)2段階のパイプラインによる低コストなドメイン蒸留、(2)32,000の長期的推論のために、メモリ消費を55%削減するスパース混合処理の注意機構、(3)ドメイン固有のニュアンスを統合する知識認識適応の3つの革新が含まれている。 ScienceBenchの実験結果によると、SciGPTは配列を含む中核的な科学的タスクにおいてGPT-4oを上回っている。
参考スコア（独自算出の注目度）: 3.779883844533933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific literature is growing exponentially, creating a critical bottleneck for researchers to efficiently synthesize knowledge. While general-purpose Large Language Models (LLMs) show potential in text processing, they often fail to capture scientific domain-specific nuances (e.g., technical jargon, methodological rigor) and struggle with complex scientific tasks, limiting their utility for interdisciplinary research. To address these gaps, this paper presents SciGPT, a domain-adapted foundation model for scientific literature understanding and ScienceBench, an open source benchmark tailored to evaluate scientific LLMs. Built on the Qwen3 architecture, SciGPT incorporates three key innovations: (1) low-cost domain distillation via a two-stage pipeline to balance performance and efficiency; (2) a Sparse Mixture-of-Experts (SMoE) attention mechanism that cuts memory consumption by 55\% for 32,000-token long-document reasoning; and (3) knowledge-aware adaptation integrating domain ontologies to bridge interdisciplinary knowledge gaps. Experimental results on ScienceBench show that SciGPT outperforms GPT-4o in core scientific tasks including sequence labeling, generation, and inference. It also exhibits strong robustness in unseen scientific tasks, validating its potential to facilitate AI-augmented scientific discovery.
Abstract（参考訳）: 科学文献は指数関数的に増加しており、研究者が知識を効率的に合成する上で重要なボトルネックとなっている。汎用Large Language Models (LLM) はテキスト処理のポテンシャルを示すが、科学的領域固有のニュアンス(技術的用語、方法論的厳密さなど)を捉えることができず、複雑な科学的課題に悩まされ、学際的な研究に限界がある。本稿では,科学文献理解のための領域適応基盤モデルであるSciGPTと,科学的LLMを評価するためのオープンソースベンチマークであるSciGPTについて述べる。 Qwen3アーキテクチャをベースとして構築されたSciGPTには,(1)2段階のパイプラインによる低コストなドメイン蒸留によるパフォーマンスと効率のバランス,(2)32,000件の長期ドキュメント推論において,メモリ消費を55倍に削減するSMOE(Sparse Mixture-of-Experts)アテンション機構,(3)ドメインオントロジーの統合による学際的知識ギャップのブリッジなど,3つの重要なイノベーションが含まれている。 ScienceBenchの実験結果によると、SciGPTはシークエンスラベリング、生成、推論を含む中核的な科学的タスクにおいてGPT-4oを上回っている。また、目に見えない科学的タスクにおいて強力な堅牢性を示し、AIを増強した科学的発見を促進する可能性を検証している。

論文の概要: SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery

関連論文リスト