TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
- URL: http://arxiv.org/abs/2404.19205v1
- Date: Tue, 30 Apr 2024 02:05:18 GMT
- Title: TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
- Authors: Yoonsik Kim, Moonbin Yim, Ka Yeon Song,
- Abstract summary: This paper establishes a benchmark for table visual question answering, referred to as the TableVQA-Bench.
It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA.
- Score: 4.828743805126944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest accuracy among commercial and open-sourced MLLMs from our experiments. Moreover, we discover that the number of vision queries plays a significant role in TableVQA performance. To further analyze the capabilities of MLLMs in comparison to their LLM backbones, we investigate by presenting image-formatted tables to MLLMs and text-formatted tables to LLMs, respectively. Our findings suggest that processing visual inputs is more challenging than text inputs, as evidenced by the lower performance of MLLMs, despite generally requiring higher computational costs than LLMs. The proposed TableVQA-Bench and evaluation codes are available at \href{https://github.com/naver-ai/tablevqabench}{https://github.com/naver-ai/tablevqabench}.
Related papers
- TTQA-RS- A break-down prompting approach for Multi-hop Table-Text Question Answering with Reasoning and Summarization [3.531533402602335]
Multi-hop table-text QA requires multiple hops between the table and text.
We have proposed a model - TTQA-RS: A break-down prompting approach.
Our results are comparable with the training-based state-of-the-art models.
arXiv Detail & Related papers (2024-06-20T20:55:38Z) - Multimodal Table Understanding [26.652797853893233]
How to directly understand tables using intuitive visual information is a crucial and urgent challenge for developing more practical applications.
We propose a new problem, multimodal table understanding, where the model needs to generate correct responses to various table-related requests.
We develop Table-LLaVA, a generalist multimodal large language model (MLLM), which significantly outperforms recent open-source MLLM baselines on 23 benchmarks.
arXiv Detail & Related papers (2024-06-12T11:27:03Z) - TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [51.23025356179886]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism.
This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering.
We establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9,000 QA pairs.
arXiv Detail & Related papers (2024-06-03T13:54:05Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing
Semi-structured Data for Large Language Model Reasoning [58.11442663694328]
We propose TAP4LLM as a versatile pre-processing toolbox to generate table prompts.
In each module, we collect and design several common methods for usage in various scenarios.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - TableQAKit: A Comprehensive and Practical Toolkit for Table-based
Question Answering [23.412691101965414]
TableQAKit is the first comprehensive toolkit designed specifically for TableQA.
TableQAKit is open-source with an interactive interface that includes visual operations, and comprehensive data for ease of use.
arXiv Detail & Related papers (2023-10-23T16:33:23Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Multi-Row, Multi-Span Distant Supervision For Table+Text Question [33.809732338627136]
Question answering (QA) over tables and linked text, also called TextTableQA, has witnessed significant research in recent years.
We present MITQA, a transformer-based TextTableQA system that is explicitly designed to cope with distant supervision along both these axes.
arXiv Detail & Related papers (2021-12-14T12:48:19Z) - CLTR: An End-to-End, Transformer-Based System for Cell Level Table
Retrieval and Table Question Answering [8.389189333083513]
We present the first end-to-end, transformer-based table question answering (QA) system.
It takes natural language questions and massive table corpus as inputs to retrieve the most relevant tables and locate the correct table cells to answer the question.
We introduce two new open-domain benchmarks, E2E_WTQ and E2E_GNQ, consisting of 2,005 natural language questions over 76,242 tables.
arXiv Detail & Related papers (2021-06-08T15:22:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.