Hybrid Graphs for Table-and-Text based Question Answering using LLMs
- URL: http://arxiv.org/abs/2501.17767v1
- Date: Wed, 29 Jan 2025 16:58:18 GMT
- Title: Hybrid Graphs for Table-and-Text based Question Answering using LLMs
- Authors: Ankush Agarwal, Ganesh S, Chaitanya Devaguptapu,
- Abstract summary: We present a novel Hybrid Graph-based approach for Table-Text QA.
We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets.
Our method achieves the best zero-shot performance on both datasets.
- Score: 2.3759432635713895
- License:
- Abstract: Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-source text data in a zero-shot setting, yet exploration into multi-source Table-Text QA remains limited. In this paper, we present a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning. Our method constructs a unified Hybrid Graph from textual and tabular data, pruning information based on the input question to provide the LLM with relevant context concisely. We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets using state-of-the-art LLMs, including GPT-3.5, GPT-4, and LLaMA-3. Our method achieves the best zero-shot performance on both datasets, improving Exact Match scores by up to 10% on Hybrid-QA and 5.4% on OTT-QA. Moreover, our approach reduces token usage by up to 53% compared to the original context.
Related papers
- Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering [16.790216473975146]
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning.
Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs.
We propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning.
arXiv Detail & Related papers (2024-12-28T13:13:33Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
TACT contains challenging instructions that demand stitching information scattered across one or more texts.
We construct this dataset by leveraging an existing dataset of texts and their associated tables.
We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator.
Key innovation in our method lies in the Synthesize Step-by-Step strategy.
We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z) - Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data [29.07028542633284]
Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus.
There is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems.
In this paper, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data.
arXiv Detail & Related papers (2024-02-20T10:00:58Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering
over Text, Tables and Images [24.17147521556083]
In-context learning has become the most popular way to solve QA problems.
We propose MMHQA-ICL framework for addressing this problems.
We are the first to use end-to-end prompting method for this task.
arXiv Detail & Related papers (2023-09-09T13:35:01Z) - Mixed-modality Representation Learning and Pre-training for Joint
Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed.
We conduct retrieval-centric mixed-modality synthetic pre-training.
OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z) - Intermediate Training on Question Answering Datasets Improves Generative
Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task.
We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain.
We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z) - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and
Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA.
We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.