Related papers: HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

URL: http://arxiv.org/abs/2406.10803v1
Date: Sun, 16 Jun 2024 04:53:29 GMT
Title: HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies
Authors: William Watson, Nicole Cho, Tucker Balch, Manuela Veloso,
Abstract summary: We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. "HiddenTables" is played between the code-generating "r" and the "Oracle windows" which evaluates the ability of the agents to solve Table QA tasks. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries.
Score: 9.09415727445941
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A myriad of different Large Language Models (LLMs) face a common challenge in contextually analyzing table question-answering tasks. These challenges are engendered from (1) finite context windows for large tables, (2) multi-faceted discrepancies amongst tokenization patterns against cell boundaries, and (3) various limitations stemming from data confidentiality in the process of using external models such as gpt-3.5-turbo. We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. In essence, "HiddenTables" is played between the code-generating LLM "Solver" and the "Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks. This game is based on natural language schemas and importantly, ensures the security of the underlying data. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries, handle compositional dependencies, and align natural language to programmatic commands when concrete table schemas are provided. Unlike encoder-based models, we have pushed the boundaries of "HiddenTables" to not be limited by the number of rows - therefore we exhibit improved efficiency in prompt and completion tokens. Our infrastructure has spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer triplets and provides additional fine-grained breakdowns & labels for varying question taxonomies. Therefore, in tandem with our academic contributions regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile manifestation of how LLMs can interact with massive datasets while ensuring data security and minimizing generation costs.

Related papers

TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models [57.005158277893194]
TableLoRA is a module designed to improve LLMs' understanding of table structure during PEFT. It incorporates special tokens for serializing tables with special token encoder and uses 2D LoRA to encode low-rank information on cell positions.
arXiv Detail & Related papers (2025-03-06T12:50:14Z)
Piece of Table: A Divide-and-Conquer Approach for Selecting Subtables in Table Question Answering [20.926770550682964]
PieTa is a new framework for subtable-based question answering (QA) It operates through an iterative process of dividing tables into smaller windows, using LMs to select relevant cells within each window, and merging these cells into a subtable. It demonstrates improved performance over previous subtable-based QA approaches.
arXiv Detail & Related papers (2024-12-10T16:08:14Z)
TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models [54.44272772296578]
Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification. LLMs directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs. We propose TableTime, which reformulates MTSC as a table understanding task.
arXiv Detail & Related papers (2024-11-24T07:02:32Z)
Accurate and Regret-aware Numerical Problem Solver for Tabular Question Answering [29.384514074911955]
We propose a model named TabLaP that uses Large Language Models as a planner rather than an answer generator. We show that TabLaP is substantially more accurate than the state-of-the-art models, improving the answer accuracy by 5.7% and 5.8% on the two datasets.
arXiv Detail & Related papers (2024-10-10T05:34:00Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
Seek and Solve Reasoning for Table Question Answering [49.006950918895306]
This paper improves Table-based Question Answering (TQA) performance by leveraging Large Language Models' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-seek pipeline that instructs the LLM to first seek relevant information and then answer questions. We present a compact single-stage TQA-solving prompt distilled from the pipeline.
arXiv Detail & Related papers (2024-09-09T02:41:00Z)
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering [11.214912072391108]
Real-world datasets often feature a vast array of attributes and complex values. Traditional methods cannot fully relay the datasets size and complexity to the Large Language Models. We propose a novel framework that leverages Full-Text Search (FTS) on the input table.
arXiv Detail & Related papers (2024-08-22T13:13:06Z)
Uncovering Limitations of Large Language Models in Information Seeking from Tables [28.19697259795014]
This paper introduces a more reliable benchmark for Table Information Seeking (TabIS) To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format.
arXiv Detail & Related papers (2024-06-06T14:30:59Z)
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. TACT contains challenging instructions that demand stitching information scattered across one or more texts. We construct this dataset by leveraging an existing dataset of texts and their associated tables. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z)
CABINET: Content Relevance based Noise Reduction for Table Question Answering [21.899938933558396]
CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) is a framework to enable Large Language Models (LLMs) to focus on relevant data by suppressing extraneous information. It is more robust to derive noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and Wiki datasets.
arXiv Detail & Related papers (2024-02-02T05:48:39Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort. We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.