Fugu-MT 論文翻訳(概要): Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding

論文の概要: Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding

arxiv url: http://arxiv.org/abs/2508.18676v1
Date: Tue, 26 Aug 2025 04:46:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.684017
Title: Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding
Title（参考訳）: 言語理解のためのLLM推論改善のためのトレーニングデータの利用
Authors: Chufan Gao, Jintai Chen, Jimeng Sun,
Abstract要約: 本稿では,新しいプロンプトベースの推論手法であるLearn then Retrieve: LRTabを提案する。まず、トレーニングデータ上でCoT応答を得るためにプロンプトを使用します。誤り CoT に対して,データからの洞察を学習し,誤りを避けるため,LLM に Prompt Conditions の予測を指示する。最後に、推論時に、テーブル理解のための追加コンテキストのために最も関連性の高いPrompt条件を検索する。
参考スコア（独自算出の注目度）: 28.832331050993464
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated tabular understanding and reasoning are essential tasks for data scientists. Recently, Large language models (LLMs) have become increasingly prevalent in tabular reasoning tasks. Previous work focuses on (1) finetuning LLMs using labeled data or (2) Training-free prompting LLM agents using chain-of-thought (CoT). Finetuning offers dataset-specific learning at the cost of generalizability. Training-free prompting is highly generalizable but does not take full advantage of training data. In this paper, we propose a novel prompting-based reasoning approach, Learn then Retrieve: LRTab, which integrates the benefits of both by retrieving relevant information learned from training data. We first use prompting to obtain CoT responses over the training data. For incorrect CoTs, we prompt the LLM to predict Prompt Conditions to avoid the error, learning insights from the data. We validate the effectiveness of Prompt Conditions using validation data. Finally, at inference time, we retrieve the most relevant Prompt Conditions for additional context for table understanding. We provide comprehensive experiments on WikiTQ and Tabfact, showing that LRTab is interpretable, cost-efficient, and can outperform previous baselines in tabular reasoning.
Abstract（参考訳）: 自動表理解と推論は、データサイエンティストにとって必須のタスクである。近年,大規模言語モデル (LLMs) は表型推論タスクにおいてますます普及している。従来の研究は,(1)ラベル付きデータを用いたLCMの微調整,(2)チェーン・オブ・シント(CoT)を用いたトレーニングフリーなLSMエージェントの開発に重点を置いていた。ファインタニングは、一般化可能性の犠牲でデータセット固有の学習を提供する。トレーニングフリーのプロンプトは非常に一般化できるが、トレーニングデータを完全に活用することはできない。本稿では,学習データから学習した関連情報を取得することによって,両者のメリットを統合する,新たなプロンプトベースの推論手法であるLearn then Retrieve: LRTabを提案する。まず、トレーニングデータ上でCoT応答を得るためにプロンプトを使用します。誤り CoT に対して,データからの洞察を学習し,誤りを避けるため,LLM に Prompt Conditions の予測を指示する。検証データを用いて,プロンプト条件の有効性を検証する。最後に、推論時に、テーブル理解のための追加コンテキストのために最も関連性の高いPrompt条件を検索する。我々はWikiTQとTabfactの総合的な実験を行い、LRTabは解釈可能でコスト効率が良く、表の推論において従来のベースラインよりも優れていることを示した。

論文の概要: Utilizing Training Data to Improve LLM Reasoning for Tabular Understanding

関連論文リスト