Fugu-MT 論文翻訳(概要): Can Large Language Models Reinvent Foundational Algorithms?

論文の概要: Can Large Language Models Reinvent Foundational Algorithms?

arxiv url: http://arxiv.org/abs/2604.05716v1
Date: Tue, 07 Apr 2026 11:15:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.780502
Title: Can Large Language Models Reinvent Foundational Algorithms?
Title（参考訳）: 大規模言語モデルは基礎的アルゴリズムを再発明できるか?
Authors: Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He,
Abstract要約: LLMはコンピュータ科学の基盤的アルゴリズムを再発明できるか? textitUnlearn-and-Reinventパイプラインは、LLMアンラーニングを適用して、特定の基礎アルゴリズムを除去し、モデルが制御された環境で再発明できるかどうかをテストする。 10個のターゲットアルゴリズム、3つの強力なオープンウェイトモデル、3つのヒントレベルにおいて、最強モデルであるQwen3-4B-Thinking-2507がヒントなしで50%のアルゴリズムを再発明し、ヒントレベル1で70%、ヒントレベル2で90%を達成できた。
参考スコア（独自算出の注目度）: 14.986588554815567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.
Abstract（参考訳）: LLMは科学的発見を推し進める強い可能性を示している。しかし、基礎的なイノベーションの能力を持っているかどうかは未解決のままだ。本研究では,基礎的イノベーションの前提として,LLMがコンピュータ科学の基盤的アルゴリズムを再発明できるのか,という課題に焦点をあてる。我々の \textit{Unlearn-and-Reinvent} パイプラインは LLM アンラーニングを適用し、Dijkstra や Euclid のアルゴリズムのような特定の基礎アルゴリズムを LLM の事前訓練された知識から取り除き、モデルが制御された環境で再発明できるかどうかをテストする。効果的なアンラーニングを可能にするために,GRPOをベースとしたオンラインアンラーニング手法を採用した。実験では,(1)最強モデルであるQwen3-4B-Thinking-2507がヒント無しで50%, ヒントレベル1で70%, ヒントレベル2で90%のアルゴリズムを再発明し, 2) 少数の高レベルヒントは再発明成功率を高めることができるが, ステップバイステップのヒントでさえ複雑なアルゴリズムでは失敗し, (3) テスト時強化学習により, ヒントレベル2でストラッセンアルゴリズムの再発明に成功した。アウトプット・トラジェクトリの分析とアブレーション・スタディにより、再発明フェーズにおける生成検証はモデルの推論強度を維持する上で重要な役割を担っており、「思考崩壊」現象を避けるのに役立っていることが判明した。これらの知見は、LLMの革新的思考の可能性と現在の限界に関する洞察を与える。

論文の概要: Can Large Language Models Reinvent Foundational Algorithms?

関連論文リスト