Fugu-MT 論文翻訳(概要): SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA

論文の概要: SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA

arxiv url: http://arxiv.org/abs/2509.25459v1
Date: Mon, 29 Sep 2025 20:07:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 17:09:04.306987
Title: SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA
Title（参考訳）: SimulRAG: 長期科学QAにおけるLLMのグラウンド化のためのシミュレータベースRAG
Authors: Haozhou Xu, Dongxia Wu, Matteo Chinazzi, Ruijia Niu, Rose Yu, Yi-An Ma,
Abstract要約: 大規模言語モデル (LLMs) は科学的問題の解決において有望であることを示す。科学的な疑問に対する長文の回答を生成するのに役立ちます。 LLMは幻覚に悩まされることが多く、特に長期にわたる科学的な疑問応答の難しい課題に悩まされる。
参考スコア（独自算出の注目度）: 35.02813727925432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) show promise in solving scientific problems. They can help generate long-form answers for scientific questions, which are crucial for comprehensive understanding of complex phenomena that require detailed explanations spanning multiple interconnected concepts and evidence. However, LLMs often suffer from hallucination, especially in the challenging task of long-form scientific question answering. Retrieval-Augmented Generation (RAG) approaches can ground LLMs by incorporating external knowledge sources to improve trustworthiness. In this context, scientific simulators, which play a vital role in validating hypotheses, offer a particularly promising retrieval source to mitigate hallucination and enhance answer factuality. However, existing RAG approaches cannot be directly applied for scientific simulation-based retrieval due to two fundamental challenges: how to retrieve from scientific simulators, and how to efficiently verify and update long-form answers. To overcome these challenges, we propose the simulator-based RAG framework (SimulRAG) and provide a long-form scientific QA benchmark covering climate science and epidemiology with ground truth verified by both simulations and human annotators. In this framework, we propose a generalized simulator retrieval interface to transform between textual and numerical modalities. We further design a claim-level generation method that utilizes uncertainty estimation scores and simulator boundary assessment (UE+SBA) to efficiently verify and update claims. Extensive experiments demonstrate SimulRAG outperforms traditional RAG baselines by 30.4% in informativeness and 16.3% in factuality. UE+SBA further improves efficiency and quality for claim-level generation.
Abstract（参考訳）: 大規模言語モデル (LLMs) は科学的問題の解決において有望であることを示す。これは、複数の相互接続された概念や証拠にまたがる詳細な説明を必要とする複雑な現象の包括的理解に不可欠である。しかし、LLMは幻覚に悩まされることが多い。 Retrieval-Augmented Generation (RAG) アプローチは、信頼性を向上させるために外部知識ソースを組み込むことによって、LCMを基盤にすることができる。この文脈では、仮説の検証において重要な役割を果たす科学シミュレータは、幻覚を緩和し、答えの事実性を高めるために特に有望な検索源を提供する。しかしながら、既存のRAGアプローチは、科学シミュレーターからの検索方法と、ロングフォームな回答の有効検証と更新方法の2つの根本的な課題により、科学シミュレーションベースの検索には直接適用できない。これらの課題を克服するために、シミュレーターベースのRAGフレームワーク(SimulRAG)を提案し、気候科学と疫学をカバーする長期の科学的QAベンチマークをシミュレーションと人間のアノテーションの両方で検証する。本稿では,テキストと数値を変換する汎用シミュレータ検索インタフェースを提案する。さらに、不確実性評価スコアとシミュレータ境界評価(UE+SBA)を利用してクレームを効率よく検証し、更新するクレームレベル生成手法を設計する。大規模な実験では、SimulRAGは従来のRAGベースラインを30.4%、事実性16.3%で上回っている。 UE+SBAはクレームレベル生成の効率と品質をさらに向上させる。

論文の概要: SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA

関連論文リスト