T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
- URL: http://arxiv.org/abs/2508.19813v4
- Date: Tue, 23 Sep 2025 07:48:36 GMT
- Title: T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables
- Authors: Jie Zhang, Changzai Pan, Kaiwen Wei, Sishi Xiong, Yu Zhao, Xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Yongxiang Li, Xuelong Li,
- Abstract summary: We propose the table-to-report task and construct a bilingual benchmark named T2R-bench.<n>The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains.<n>Experiments on 25 widely-used LLMs reveal that even state-of-the-art models like Deepseek-R1 only achieves performance with 62.71 overall score.
- Score: 65.12524437711737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing table benchmarks lack the capacity to adequately assess the practical application of this task. To fill this gap, we propose the table-to-report task and construct a bilingual benchmark named T2R-bench, where the key information flow from the tables to the reports for this task. The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains as well as 4 types of industrial tables. Furthermore, we propose an evaluation criteria to fairly measure the quality of report generation. The experiments on 25 widely-used LLMs reveal that even state-of-the-art models like Deepseek-R1 only achieves performance with 62.71 overall score, indicating that LLMs still have room for improvement on T2R-bench.
Related papers
- ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios [42.9161992743627]
We present ReasonTabQA, a large-scale bilingual benchmark encompassing 1,932 tables across 30 industry domains such as energy and automotive.<n>We also introduce TabCodeRL, a reinforcement learning method that leverages table-aware verifiable rewards to guide the generation of logical reasoning paths.
arXiv Detail & Related papers (2026-01-12T07:36:06Z) - TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models [30.26407735827857]
Reasoning with table-structured data poses significant challenges for large language models (LLMs)<n>We present a comprehensive table reasoning evolution benchmark, TReB, which measures both shallow table understanding abilities and deep table reasoning abilities.<n>We create an evaluation framework to robustly measure table reasoning capabilities with three distinct inference modes, TCoT, PoT and ICoT.
arXiv Detail & Related papers (2025-06-23T09:02:04Z) - Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models [52.94091440130039]
Table reasoning (TR) requires structured reasoning over semi-structured data.<n>Small language models (SLMs) have limited capacity compared to large LMs (LLMs, e.g., GPT-4o)<n>We propose program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR) by generating executable programs.<n>Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods.
arXiv Detail & Related papers (2025-06-06T14:52:19Z) - MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark [51.30875219634243]
We introduce MMTU, a large-scale benchmark with over 30K questions across 25 real-world table tasks.<n> MMTU is designed to comprehensively evaluate models ability to understand, reason, and manipulate real tables at the expert-level.<n>We show that MMTU require a combination of skills -- including table understanding, reasoning, and coding -- that remain challenging for today's frontier models.
arXiv Detail & Related papers (2025-06-05T21:05:03Z) - GTR: Graph-Table-RAG for Cross-Table Question Answering [53.11230952572134]
We propose the first Graph-Table-RAG framework, namely GTR, which reorganizes table corpora into a heterogeneous graph.<n> GTR exhibits superior cross-table question-answering performance while maintaining high deployment efficiency, demonstrating its real-world practical applicability.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - Benchmarking Table Comprehension In The Wild [9.224698222634789]
TableQuest is a new benchmark designed to evaluate the holistic table comprehension capabilities of Large Language Models (LLMs)<n>We experiment with 7 state-of-the-art models, and find that despite reasonable accuracy in locating facts, they often falter when required to execute more sophisticated reasoning or multi-step calculations.
arXiv Detail & Related papers (2024-12-13T05:52:37Z) - TableGPT2: A Large Multimodal Model with Tabular Data Integration [22.77225649639725]
TableGPT2 is a model rigorously pre-trained and fine-tuned with over 593.8K tables and 2.36M high-quality query-table-outputs.
One of TableGPT2's key innovations is its novel table encoder, specifically designed to capture schema-level and cell-level information.
arXiv Detail & Related papers (2024-11-04T13:03:13Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning [61.14586098005874]
Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning.<n>We introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools.<n>TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability.
arXiv Detail & Related papers (2024-09-18T06:19:59Z) - TableBench: A Comprehensive and Complex Benchmark for Table Question Answering [33.64465594140019]
This paper investigates the application of Large Language Models (LLMs) in industrial scenarios.<n>We propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities.<n>Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands.
arXiv Detail & Related papers (2024-08-17T11:40:10Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.