Fugu-MT 論文翻訳(概要): Can LLM Coding Agents Reason About Time Series?

論文の概要: Can LLM Coding Agents Reason About Time Series?

arxiv url: http://arxiv.org/abs/2606.16545v1
Date: Mon, 15 Jun 2026 10:49:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.464039
Title: Can LLM Coding Agents Reason About Time Series?
Title（参考訳）: LLM符号化エージェントは時系列について理にかなっているか?
Authors: Filip Rechtorík, Ondřej Dušek, Zdeněk Kasner,
Abstract要約: コードアクセスを持つエージェントは、生データを処理するモデルを最大10%向上させることができることを示す。最高のパフォーマンスエージェントでさえ、質問の約22～34%が正しく答えていない。我々の分析によると、コーディングエージェントは適切な統計検査を選択できるが、重要なニュアンスを見逃すことがしばしばある。
参考スコア（独自算出の注目度）: 0.19116784879310025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly being used for automated decision-making systems in finance, healthcare, or environmental monitoring. Time series data are ubiquitous in these fields, yet hard to process automatically. Can time series be analyzed by LLM agents? We examine three approaches: providing the agent with raw numerical data, using the LLM as a coding agent, or a combination of both. In the coding agent setup, the model iteratively queries the data using Python code. Using two time series understanding benchmarks, we show that agents with code access can outperform models processing raw data by up to 10%. However, even the best performing agent still answers about 22-34% of the questions incorrectly. To get insights into models' strategies and reasoning gaps, we analyze the model outputs with a strong LLM judge. Our analysis reveals that coding agents can select appropriate statistical tests, but often miss important nuances. Meanwhile, models with access to raw data can reach the right conclusions using back-of-the-envelope calculations.
Abstract（参考訳）: 大規模言語モデル(LLM)は、金融、医療、環境モニタリングにおける自動意思決定システムにますます使われている。時系列データはこれらの分野でユビキタスであるが、自動処理は困難である。 LLMエージェントで時系列を解析できるか? エージェントに生の数値データを提供したり、LLMを符号化エージェントとして使ったり、その両方を組み合わせたりする3つの方法を検討した。コーディングエージェントの設定では、モデルはPythonコードを使用してデータを反復的にクエリします。 2つの時系列理解ベンチマークを用いて、コードアクセスを持つエージェントが、生データを処理するモデルを最大10%向上させることができることを示す。しかし、最高のパフォーマンスエージェントでさえも、質問の約22～34%が正しく答えていない。モデルの戦略と推論ギャップに関する洞察を得るため,強力なLCM判定器を用いてモデル出力を分析した。我々の分析によると、コーディングエージェントは適切な統計検査を選択できるが、重要なニュアンスを見逃すことがしばしばある。一方、生データにアクセス可能なモデルは、バック・オブ・ザ・エンベロープ計算を使用して正しい結論に達することができる。

論文の概要: Can LLM Coding Agents Reason About Time Series?

関連論文リスト