Fugu-MT 論文翻訳(概要): Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

論文の概要: Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

arxiv url: http://arxiv.org/abs/2510.26854v1
Date: Thu, 30 Oct 2025 15:38:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:15.869284
Title: Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base
Title（参考訳）: 検証可能な推論に関する逆知識探索:長鎖知識ベースから科学百科事典を合成する
Authors: Yu Li, Yuan Huang, Tao Wang, Caiyu Fan, Xiansheng Cai, Sihan Hu, Xinzijian Liu, Cheng Shi, Mingjun Xu, Zhen Wang, Yan Wang, Xiangqi Jin, Tianhan Zhang, Linfeng Zhang, Lei Wang, Youjin Deng, Pan Zhang, Weijie Sun, Xingyu Li, Weinan E, Linfeng Zhang, Zhiyuan Yao, Kun Chen,
Abstract要約: ほとんどの科学資料は推論を圧縮し、それらを正当化する導出鎖を省略しながら結論を提示する。この圧縮は、明示的で段階的な正当化を欠いて検証を妨げ、クロスドメインリンクを阻害する。本稿では,LCoT(Long Chain-of-Thought)知識ベースを構築し,科学的推論を非圧縮化するスケーラブルなフレームワークを提案する。
参考スコア（独自算出の注目度）: 42.96788956767613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step-wise justifications and inhibits cross-domain links by collapsing the very pathways that establish the logical and causal connections between concepts. We introduce a scalable framework that decompresses scientific reasoning, constructing a verifiable Long Chain-of-Thought (LCoT) knowledge base and projecting it into an emergent encyclopedia, SciencePedia. Our pipeline operationalizes an endpoint-driven, reductionist strategy: a Socratic agent, guided by a curriculum of around 200 courses, generates approximately 3 million first-principles questions. To ensure high fidelity, multiple independent solver models generate LCoTs, which are then rigorously filtered by prompt sanitization and cross-model answer consensus, retaining only those with verifiable endpoints. This verified corpus powers the Brainstorm Search Engine, which performs inverse knowledge search -- retrieving diverse, first-principles derivations that culminate in a target concept. This engine, in turn, feeds the Plato synthesizer, which narrates these verified chains into coherent articles. The initial SciencePedia comprises approximately 200,000 fine-grained entries spanning mathematics, physics, chemistry, biology, engineering, and computation. In evaluations across six disciplines, Plato-synthesized articles (conditioned on retrieved LCoTs) exhibit substantially higher knowledge-point density and significantly lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM). Built on this verifiable LCoT knowledge base, this reasoning-centric approach enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia.
Abstract（参考訳）: ほとんどの科学資料は推論を圧縮し、それらを正当化する導出鎖を省略しながら結論を提示する。この圧縮は、明示的で段階的な正当性の欠如による検証を妨げ、概念間の論理的および因果関係を確立する経路を崩壊させることにより、ドメイン間リンクを阻害する。本稿では,科学的推論を減らし,検証可能なLong Chain-of-Thought(LCoT)知識ベースを構築し,それを創発的な百科事典SciencePediaに投影する,スケーラブルなフレームワークを提案する。約200のコースのカリキュラムで導かれるソクラティックエージェントは、約300万の第一原理の質問を生成する。高い忠実性を確保するために、複数の独立解法モデルはLCoTを生成し、それを即座に衛生化とクロスモデル応答のコンセンサスによって厳格にフィルタリングし、検証可能なエンドポイントを持つもののみを保持する。この検証されたコーパスは、inverse knowledge searchを実行するBrainstorm Search Engineを駆動する。このエンジンはプラトンシンセサイザーを供給し、これらの検証された鎖をコヒーレントな物質にナレーションする。初期のSciencePediaは、数学、物理学、化学、生物学、工学、計算にまたがる、およそ20万の精細なエントリで構成されている。 6分野にわたる評価において、プラトン合成品(検索されたLCoTに条件付き)は、(外部のLLMで判断されるように)検索無しの等速ベースラインよりも、知識ポイント密度が著しく高く、事実誤り率も著しく低い。この検証可能なLCoT知識ベースに基づいて構築されたこの推論中心のアプローチは、信頼性の高いクロスドメインな科学的合成を大規模に実現し、拡大を続ける百科事典の基盤を確立する。

論文の概要: Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

関連論文リスト