TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
- URL: http://arxiv.org/abs/2408.09174v1
- Date: Sat, 17 Aug 2024 11:40:10 GMT
- Title: TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
- Authors: Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang Li, Zhoujun Li,
- Abstract summary: This paper investigates the application of Large Language Models (LLMs) in industrial scenarios.
We propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities.
Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands.
- Score: 33.64465594140019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance with GPT-3.5. Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands, where the most advanced model, GPT-4, achieves only a modest score compared to humans.
Related papers
- Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.
We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.
Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z) - GTR: Graph-Table-RAG for Cross-Table Question Answering [53.11230952572134]
We propose the first Graph-Table-RAG framework, namely GTR, which reorganizes table corpora into a heterogeneous graph.
GTR exhibits superior cross-table question-answering performance while maintaining high deployment efficiency, demonstrating its real-world practical applicability.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - Rethinking Table Instruction Tuning [29.139828718538418]
We evaluate abilities in existing table LLMs and reveal significant declines in both out-of-domain table understanding and general capabilities.
We introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks.
arXiv Detail & Related papers (2025-01-24T18:06:07Z) - Better Think with Tables: Leveraging Tables to Enhance Large Language Model Comprehension [33.32086403802351]
Thinking with tables is a technique that assists Large Langauge Models (LLMs) to leverage tables for intermediate thinking aligning with human cognitive behavior.
We show that our approach achieves a 40.29% average relative performance increase, higher robustness, and show generalizability to different requests, conditions, or scenarios.
arXiv Detail & Related papers (2024-12-22T23:31:03Z) - Benchmarking Table Comprehension In The Wild [9.224698222634789]
TableQuest is a new benchmark designed to evaluate the holistic table comprehension capabilities of Large Language Models (LLMs)
We experiment with 7 state-of-the-art models, and find that despite reasonable accuracy in locating facts, they often falter when required to execute more sophisticated reasoning or multi-step calculations.
arXiv Detail & Related papers (2024-12-13T05:52:37Z) - TableGPT2: A Large Multimodal Model with Tabular Data Integration [22.77225649639725]
TableGPT2 is a model rigorously pre-trained and fine-tuned with over 593.8K tables and 2.36M high-quality query-table-outputs.
One of TableGPT2's key innovations is its novel table encoder, specifically designed to capture schema-level and cell-level information.
arXiv Detail & Related papers (2024-11-04T13:03:13Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Uncovering Limitations of Large Language Models in Information Seeking from Tables [28.19697259795014]
This paper introduces a more reliable benchmark for Table Information Seeking (TabIS)
To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format.
arXiv Detail & Related papers (2024-06-06T14:30:59Z) - QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries.
We propose a novel method to address these limitations by introducing query-focused multi-table summarization.
Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Large Language Models are Complex Table Parsers [26.66460264175336]
We propose to incorporate GPT-3.5 to address the challenges posed by Complex Table QA.
Specifically, we encode each cell's hierarchical structure, position information and content as datasets.
By enhancing the prompt template with an explanatory description of the meaning of each task, we effectively improve the hierarchical awareness structure capability.
arXiv Detail & Related papers (2023-12-13T01:34:42Z) - UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model
in Data Science [16.384705926693073]
This study seeks to extend the power of pretraining methodologies to facilitate the prediction over tables in data science.
We introduce UniTabE, a method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures.
In order to implement the pretraining phase, we curated an expansive dataset comprising approximately 13B samples, meticulously gathered from the Kaggle platform.
arXiv Detail & Related papers (2023-07-18T13:28:31Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.