Fugu-MT 論文翻訳(概要): TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

論文の概要: TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

arxiv url: http://arxiv.org/abs/2511.02219v2
Date: Wed, 05 Nov 2025 03:43:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-06 13:56:26.184176
Title: TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
Title（参考訳）: TabDSR: タブラルデータにおける複素数値推論のための分解, 衛生, 推論
Authors: Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng,
Abstract要約: TabDSRは,(1)複雑な質問を分解するクエリデコンパイラ,(2)ノイズの多いテーブルを浄化・フィルタリングするテーブルサニタイザ,(3)プログラム・オブ・シークレット(PoT)ベースの推論器からなるフレームワークである。テーブル上の複雑な数値推論のために特別に設計された新しいデータセットであるCalTab151を導入する。 TAT-QA, TableBench, TabDSRの精度は8.79%, 6.08%, 19.87%向上した。
参考スコア（独自算出の注目度）: 10.798423317852288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning. Data and code are available upon request.
Abstract（参考訳）: 表型データに対する複雑な推論は、実世界のデータ分析において重要であるが、複雑なクエリ、ノイズの多いデータ、限られた数値能力により、大きな言語モデル(LLM)は性能が劣ることが多い。これらの問題に対処するため,(1)複雑な質問を分解するクエリデコンポスタ,(2)ノイズの多いテーブルを浄化・フィルタリングするテーブルサニタイザ,(3)プログラム・オブ・シークレット(PoT)ベースの推論器を用いて,最終的な応答をサニタイズしたテーブルから導出する,というフレームワークであるTabDSRを提案する。偏りのない評価とデータ漏洩を軽減するため,テーブル上の複雑な数値推論のために設計された新しいデータセットであるCalTab151を導入する。 TAT-QA, TableBench, TabDSRの精度は8.79%, 6.08%, 19.87%向上した。さらに,本フレームワークはLLMとシームレスに統合し,複雑な表数推論のための堅牢なソリューションを提供する。これらの結果から,複雑な表数推論におけるLLM性能向上のためのフレームワークの有効性が示唆された。データとコードは要求に応じて利用可能だ。

論文の概要: TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

関連論文リスト