Fugu-MT 論文翻訳(概要): Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

論文の概要: Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

arxiv url: http://arxiv.org/abs/2510.07309v2
Date: Thu, 09 Oct 2025 02:27:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 12:56:53.599469
Title: Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain
Title（参考訳）: Agent Bain vs. Agent McKinsey: ビジネスドメインのための新しいテキストからSQLのベンチマーク
Authors: Yue Li, Ran Tao, Derek Hommel, Yusuf Denizay Dönder, Sungyong Chang, David Mimno, Unso Eun Seo Jo,
Abstract要約: 我々は、現実世界のビジネスコンテキストに特化して設計された新しいベンチマークであるCORGIを紹介する。ビジネスクエリの4つのカテゴリ – 説明性,説明性,予測性,レコメンデーション – に関する質問を提供する。 CORGIのパフォーマンスは高いレベルの質問で低下し,正確な予測と実行可能な計画の提供に苦慮していることがわかった。
参考スコア（独自算出の注目度）: 10.89800905114692
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the business domain, where data-driven decision making is crucial, text-to-SQL is fundamental for easy natural language access to structured data. While recent LLMs have achieved strong performance in code generation, existing text-to-SQL benchmarks remain focused on factual retrieval of past records. We introduce CORGI, a new benchmark specifically designed for real-world business contexts. CORGI is composed of synthetic databases inspired by enterprises such as Doordash, Airbnb, and Lululemon. It provides questions across four increasingly complex categories of business queries: descriptive, explanatory, predictive, and recommendational. This challenge calls for causal reasoning, temporal forecasting, and strategic recommendation, reflecting multi-level and multi-step agentic intelligence. We find that LLM performance drops on high-level questions, struggling to make accurate predictions and offer actionable plans. Based on execution success rate, the CORGI benchmark is about 21% more difficult than the BIRD benchmark. This highlights the gap between popular LLMs and the need for real-world business intelligence. We release a public dataset and evaluation framework, and a website for public submissions.
Abstract（参考訳）: データ駆動による意思決定が不可欠であるビジネス領域では、構造化データへの自然言語アクセスを容易にするために、テキストからSQLが不可欠です。近年のLLMはコード生成において高いパフォーマンスを達成しているが、既存のテキスト-SQLベンチマークは依然として過去のレコードの事実検索に重点を置いている。我々は、現実世界のビジネスコンテキストに特化して設計された新しいベンチマークであるCORGIを紹介する。 CORGIは、Doordash、Airbnb、Lululemonといった企業にインスパイアされた合成データベースで構成されている。ビジネスクエリの4つのより複雑なカテゴリ – 説明的,説明的,予測的,レコメンデーション – に関する質問を提供する。この課題は、多段階および多段階のエージェントインテリジェンスを反映した因果推論、時間予測、戦略的レコメンデーションを必要とする。 LLMの性能は高レベルな質問に対して低下し,正確な予測と実行可能な計画の提供に苦慮していることがわかった。実行の成功率に基づいて、CORGIベンチマークはBIRDベンチマークよりも約21%難しい。これは、人気のあるLLMと現実世界のビジネスインテリジェンスの必要性のギャップを浮き彫りにする。パブリックデータセットと評価フレームワーク、公開提出のためのWebサイトをリリースします。

論文の概要: Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

関連論文リスト