CABINET: Content Relevance based Noise Reduction for Table Question
Answering
- URL: http://arxiv.org/abs/2402.01155v3
- Date: Tue, 13 Feb 2024 09:11:01 GMT
- Title: CABINET: Content Relevance based Noise Reduction for Table Question
Answering
- Authors: Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumit Bhatia, Yaman
Kumar, Balaji Krishnamurthy
- Abstract summary: CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) is a framework to enable Large Language Models (LLMs) to focus on relevant data by suppressing extraneous information.
It is more robust to derive noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and Wiki datasets.
- Score: 21.899938933558396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table understanding capability of Large Language Models (LLMs) has been
extensively studied through the task of question-answering (QA) over tables.
Typically, only a small part of the whole table is relevant to derive the
answer for a given question. The irrelevant parts act as noise and are
distracting information, resulting in sub-optimal performance due to the
vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content
RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to
enable LLMs to focus on relevant tabular data by suppressing extraneous
information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained
differentially with the QA LLM, that weighs the table content based on its
relevance to the input question before feeding it to the question-answering LLM
(QA LLM). To further aid the relevance scorer, CABINET employs a weakly
supervised module that generates a parsing statement describing the criteria of
rows and columns relevant to the question and highlights the content of
corresponding table cells. CABINET significantly outperforms various tabular
LLM baselines, as well as GPT3-based in-context learning methods, is more
robust to noise, maintains outperformance on tables of varying sizes, and
establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We
release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.
Related papers
- TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models [54.44272772296578]
Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification.
LLMs directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs.
We propose TableTime, which reformulates MTSC as a table understanding task.
arXiv Detail & Related papers (2024-11-24T07:02:32Z) - From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment.
Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation.
Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z) - HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies [9.09415727445941]
We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge.
"HiddenTables" is played between the code-generating "r" and the "Oracle windows" which evaluates the ability of the agents to solve Table QA tasks.
We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries.
arXiv Detail & Related papers (2024-06-16T04:53:29Z) - Uncovering Limitations of Large Language Models in Information Seeking from Tables [28.19697259795014]
This paper introduces a more reliable benchmark for Table Information Seeking (TabIS)
To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format.
arXiv Detail & Related papers (2024-06-06T14:30:59Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition [6.253771639590562]
Table reasoning is a challenging task that requires understanding both natural language questions and structured data.
We propose Tabify, a novel method that leverages text-to-generation to decompose tables into smaller and relevant sub-tables.
Our method performs remarkably well on the WikiTQ benchmark, achieving an accuracy of 64.7%.
arXiv Detail & Related papers (2024-04-15T21:42:20Z) - Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks.
We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs.
We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.