Fugu-MT 論文翻訳(概要): Math Takes Two: A test for emergent mathematical reasoning in communication

論文の概要: Math Takes Two: A test for emergent mathematical reasoning in communication

arxiv url: http://arxiv.org/abs/2604.21935v1
Date: Mon, 30 Mar 2026 08:28:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.14423
Title: Math Takes Two: A test for emergent mathematical reasoning in communication
Title（参考訳）: Math Takes Two: コミュニケーションにおける創発的数学的推論のテスト
Authors: Michael Cooper, Samuel Cooper,
Abstract要約: Math Takes Twoは、コミュニケーションによる数学的推論の出現を評価するために設計された新しいベンチマークである。ヒトの数学的認知は、正確なコミュニケーションの必要性に共進化したという仮説に触発され、我々のベンチマークは、従来の数学的知識のない2つのエージェントが共通のシンボルプロトコルを開発できるかどうかを検証した。
参考スコア（独自算出の注目度）: 1.2891210250935148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although language models demonstrate remarkable proficiency on mathematical benchmarks, it remains unclear whether this reflects true mathematical reasoning or statistical pattern matching over learning formal syntax. Most existing evaluations rely on symbolic problems grounded in established mathematical conventions, limiting insight into the models' ability to construct abstract concepts from first principles. In this work, we propose Math Takes Two, a new benchmark designed to assess the emergence of mathematical reasoning through communication. Motivated by the hypothesis that mathematical cognition in humans co-evolved with the need for precise communication, our benchmark tests whether two agents, without prior mathematical knowledge, can develop a shared symbolic protocol to solve a visually grounded task where the use of a numerical system facilitates extrapolation. Unlike many current datasets, our benchmark eschews predefined mathematical language, instead requiring agents to discover latent structure and representations from scratch. Math Takes Two thus provides a novel lens through which to develop and evaluate models with emergent numerical reasoning capabilities.
Abstract（参考訳）: 言語モデルは数学的なベンチマークにおいて顕著な習熟度を示すが、それが真の数学的推論や形式構文の学習よりも統計的パターンマッチングを反映しているかどうかは不明である。既存の評価のほとんどは、確立された数学的慣習に根ざした象徴的な問題に依存しており、モデルが第一原理から抽象的な概念を構築する能力についての洞察を制限している。本研究では,コミュニケーションによる数学的推論の出現を評価するための新しいベンチマークであるMath Takes Twoを提案する。人間の数学的認知は、正確なコミュニケーションの必要性が伴うという仮説により、我々のベンチマークテストでは、事前の数学的知識のない2つのエージェントが、数値システムの使用によって外挿が容易になるような視覚的な課題を解決するための共有シンボルプロトコルを開発できるかどうかを検証した。現在の多くのデータセットとは異なり、ベンチマークでは事前に定義された数学的言語を抽出し、代わりにエージェントがスクラッチから潜在構造や表現を発見する必要がある。すなわち、Math Takes Twoは、創発的な数値推論能力を持つモデルの開発と評価を行う新しいレンズを提供する。

論文の概要: Math Takes Two: A test for emergent mathematical reasoning in communication

関連論文リスト