Related papers: TABLET: A Large-Scale Dataset for Robust Visual Table Understanding

TABLET: A Large-Scale Dataset for Robust Visual Table Understanding

URL: http://arxiv.org/abs/2509.21205v2
Date: Wed, 05 Nov 2025 16:33:45 GMT
Title: TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
Authors: Iñigo Alonso, Imanol Miranda, Eneko Agirre, Mirella Lapata,
Abstract summary: Existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions.<n>We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations.
Score: 46.96642907587549
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While table understanding increasingly relies on pixel-only settings where tables are processed as visual representations, current benchmarks predominantly use synthetic renderings that lack the complexity and visual diversity of real-world tables. Additionally, existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions, providing no access to underlying serialized data for reformulation. We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Each example includes paired image-HTML representations, comprehensive metadata, and provenance information linking back to the source datasets. Fine-tuning vision-language models like Qwen2.5-VL-7B on TABLET improves performance on seen and unseen VTU tasks while increasing robustness on real-world table visualizations. By preserving original visualizations and maintaining example traceability in a unified large-scale collection, TABLET establishes a foundation for robust training and extensible evaluation of future VTU models.

Related papers

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement [58.957050610762565]
ShowTable is a pipeline that synergizes MLLMs with diffusion models via a progressive self-correcting process.<n> MLLM acts as the central orchestrator for reasoning the visual plan and judging visual errors.<n>We introduce TableVisBench, a new benchmark with 800 challenging instances across 5 evaluation dimensions.
arXiv Detail & Related papers (2025-12-15T13:21:50Z)
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction [1.2554129265335303]
Table extraction is a key challenge in visual document understanding.<n>PubTables-v2 is the first large-scale benchmark for multi-page table structure recognition.<n>We use PubTables-v2 to create the Page-Object Table Transformer (POTATR), an image-to-graph extension of the Table Transformer to comprehensive page-level TE.
arXiv Detail & Related papers (2025-12-11T18:19:00Z)
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding [52.59372043981724]
TableDART is a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models.<n>In addition, we propose a novel agent to cross-modal knowledge integration by analyzing outputs from text- and image-based models.
arXiv Detail & Related papers (2025-09-18T07:00:13Z)
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images [0.42970700836450476]
Visual-TableQA is a large-scale, open-domain dataset designed to evaluate and enhance visual reasoning over complex data.<n>Visual-TableQA comprises 2.5k richly structured-rendered tables and 6k reasoning-intensive QA pairs, all produced at a cost of under USD 100.
arXiv Detail & Related papers (2025-09-09T17:52:26Z)
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [81.76462101465354]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering. To better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA.
arXiv Detail & Related papers (2024-06-03T13:54:05Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort. We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z)
Retrieving Complex Tables with Multi-Granular Graph Representation Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes. We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.