Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries
- URL: http://arxiv.org/abs/2312.13671v1
- Date: Thu, 21 Dec 2023 08:50:41 GMT
- Title: Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries
- Authors: Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan
Gao, Ran Jia, Xu Chen, Shi Han, Zejian Yuan, Dongmei Zhang
- Abstract summary: We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
- Score: 67.0083902913112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular data analysis is crucial in various fields, and large language models
show promise in this area. However, current research mostly focuses on
rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like
forecasting and chart generation. To address this gap, we developed the
Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond
the SQL-compatible operations and require more in-depth analysis. We also
develop five innovative and effective annotation methods, harnessing the
capabilities of large language models to enhance data quality and quantity.
Additionally, we include unclear queries that resemble real-world user
questions to test how well models can understand and tackle such challenges.
Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five
state-of-the-art models using three different metrics and the results show that
our benchmark presents introduces considerable challenge in the field of
tabular data analysis, paving the way for more advanced research opportunities.
Related papers
- BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - Facts-and-Feelings: Capturing both Objectivity and Subjectivity in Table-to-Text Generation [41.09752906121257]
We introduce the Ta2TS dataset with 3849 data instances.
We perform the task of fine-tuning sequence-to-sequence models on the linearized tables and prompting on popular large language models.
arXiv Detail & Related papers (2024-06-15T08:41:44Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
TACT contains challenging instructions that demand stitching information scattered across one or more texts.
We construct this dataset by leveraging an existing dataset of texts and their associated tables.
We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability [27.16741353384065]
Text-to-vis models often rely on lexical matching between words in the questions and tokens in data schemas.
In this study, we examine the robustness of current text-to-vis models, an area that has not previously been explored.
We propose a novel framework based on Retrieval-Augmented Generation (RAG) technique, named GRED, specifically designed to address input perturbations in two variants.
arXiv Detail & Related papers (2024-04-10T16:12:50Z) - Wiki-TabNER:Advancing Table Interpretation Through Named Entity
Recognition [19.423556742293762]
We analyse a widely used benchmark dataset for evaluation of TI tasks.
To overcome this drawback, we construct and annotate a new more challenging dataset.
We propose a prompting framework for evaluating the newly developed large language models.
arXiv Detail & Related papers (2024-03-07T15:22:07Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.