Fugu-MT 論文翻訳(概要): Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

論文の概要: Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

arxiv url: http://arxiv.org/abs/2510.00919v2
Date: Thu, 02 Oct 2025 09:55:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.597919
Title: Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving
Title（参考訳）: オリンピックレベルの物理問題の解法における探索的生成を伴うベンチマーク基礎モデル
Authors: Shunfeng Zheng, Yudi Zhang, Meng Fang, Zihan Zhang, Zhitan Wu, Mykola Pechenizkiy, Ling Chen,
Abstract要約: 基礎モデルを用いた検索拡張世代(RAG)は,様々なタスクにおいて高い性能を達成している。しかし、オリンピアードレベルの物理学問題を解くような専門家レベルの推論能力は、ほとんど解明されていない。我々は,Olympiadレベルの物理に特化して設計された高品質なマルチモーダルデータセットであるPhoPileを紹介する。 PhoPileを用いて,大規模言語モデル (LLM) と大規模マルチモーダルモデル (LMM) の両方を複数のレトリバーでカバーするRAG拡張基盤モデルのベンチマークを行った。
参考スコア（独自算出の注目度）: 56.119382216818195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented generation (RAG) with foundation models has achieved strong performance across diverse tasks, but their capacity for expert-level reasoning-such as solving Olympiad-level physics problems-remains largely unexplored. Inspired by the way students prepare for competitions by reviewing past problems, we investigate the potential of RAG to enhance physics reasoning in foundation models. We introduce PhoPile, a high-quality multimodal dataset specifically designed for Olympiad-level physics, enabling systematic study of retrieval-based reasoning. PhoPile includes diagrams, graphs, and equations, capturing the inherently multimodal nature of physics problem solving. Using PhoPile, we benchmark RAG-augmented foundation models, covering both large language models (LLMs) and large multimodal models (LMMs) with multiple retrievers. Our results demonstrate that integrating retrieval with physics corpora can improve model performance, while also highlighting challenges that motivate further research in retrieval-augmented physics reasoning.
Abstract（参考訳）: 基礎モデルを用いた検索拡張世代(RAG)は、様々なタスクで高い性能を達成しているが、オリンピアードレベルの物理学問題を解くような専門家レベルの推論能力は、ほとんど探索されていない。過去の問題を見直すことで,学生が競争に備える方法に触発されて,基礎モデルにおける物理推論を強化するためのRAGの可能性について検討する。我々は,Olympiadレベルの物理に特化して設計された高品質なマルチモーダルデータセットであるPhoPileを紹介し,検索に基づく推論の体系的な研究を可能にする。 PhoPileにはダイアグラム、グラフ、方程式が含まれており、物理問題解決の本質的にマルチモーダルな性質を捉えている。 PhoPileを用いて,大規模言語モデル (LLM) と大規模マルチモーダルモデル (LMM) の両方を複数のレトリバーでカバーするRAG拡張基盤モデルのベンチマークを行った。この結果から,検索と物理コーパスの統合によりモデル性能が向上し,検索強化物理推論のさらなる研究を動機付ける課題が浮き彫りになった。

論文の概要: Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

関連論文リスト