Fugu-MT 論文翻訳(概要): StreetMath: Study of LLMs' Approximation Behaviors

論文の概要: StreetMath: Study of LLMs' Approximation Behaviors

arxiv url: http://arxiv.org/abs/2510.25776v1
Date: Mon, 27 Oct 2025 05:16:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.473285
Title: StreetMath: Study of LLMs' Approximation Behaviors
Title（参考訳）: ストリートマス:LLMの近似挙動に関する研究
Authors: Chiung-Yi Tseng, Somshubhra Roy, Maisha Thasin, Danyang Zhang, Blessing Effiong,
Abstract要約: 実世界の近似シナリオ下でのモデルの近似能力を評価するために設計されたベンチマークであるStreetMathを紹介する。我々の分析によると、LLMは一般に近似を求めるタスクにおいても、正確な値や外部ツールを計算しようと試みている。我々は、LLMは、人間が街路数学の設定で行うのと同じように、認知的ミスを示さないと論じる。
参考スコア（独自算出の注目度）: 1.4119508208285607
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: There is a substantial body of literature examining the mathematical reasoning capabilities of large language models (LLMs), particularly their performance on precise arithmetic operations in autoregressive architectures. However, their ability to perform approximate reasoning in informal, fast-paced mathematical operations has received far less attention, especially among non-autoregressive decoder models. Our work addresses this gap by introducing StreetMath, a benchmark designed to evaluate models' approximation abilities under real-world approximation scenarios. We conduct extensive evaluations across different LLM architectures: Qwen3-4B-Instruct-2507, Qwen3-4B-Thinking-2507, Dream-v0-Instruct-7B, Falcon-Mamba-7B-Instruct, and Mamba-GPT-3B. Furthermore, we apply mechanistic interpretability techniques to probe their internal computational states. Our analysis reveals that LLMs generally attempt to compute exact values or invoke external tools even in tasks that call for approximation. Moreover, while models sometimes reach the correct answer in early layers or steps, they still consume more tokens when solving approximation tasks. Additional experiments indicate that exact and approximate arithmetic operations rely on largely separate neural components. Drawing upon research on cognitive psychology, we argue that LLMs do not exhibit cognitive miserliness in the same way humans do in street math settings. We open source our work https://github.com/ctseng777/StreetMath
Abstract（参考訳）: 大規模言語モデル(LLM)の数学的推論能力、特に自己回帰的アーキテクチャにおける正確な算術演算の性能について、かなりの量の文献がある。しかし、非自己回帰デコーダモデルにおいて、非公式で高速な数学的操作で近似推論を行う能力は、はるかに低い注目を集めている。我々の研究は、実世界の近似シナリオ下でモデルの近似能力を評価するために設計されたベンチマークであるStreetMathを導入することで、このギャップに対処する。 Qwen3-4B-Instruct-2507, Qwen3-4B-Thinking-2507, Dream-v0-Instruct-7B, Falcon-Mamba-7B-Instruct, Mamba-GPT-3B。さらに, 内部計算状態の探索に機械的解釈可能性技術を適用した。我々の分析によると、LLMは一般に、近似を求めるタスクにおいても、正確な値の計算や外部ツールの呼び出しを試みている。さらに、モデルが初期のレイヤやステップで正しい答えに達する場合もありますが、近似タスクを解決する際には、より多くのトークンを消費します。さらなる実験により、正確な近似演算は、大まかに異なる神経成分に依存することが示されている。認知心理学の研究に基づき,LLMは人間が街路数学で行うのと同じように,認知的ミスを示さないと論じる。 https://github.com/ctseng777/StreetMath.com/

論文の概要: StreetMath: Study of LLMs' Approximation Behaviors

関連論文リスト