Related papers: Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Hybrid Graphs for Table-and-Text based Question Answering using LLMs

URL: http://arxiv.org/abs/2501.17767v1
Date: Wed, 29 Jan 2025 16:58:18 GMT
Title: Hybrid Graphs for Table-and-Text based Question Answering using LLMs
Authors: Ankush Agarwal, Ganesh S, Chaitanya Devaguptapu,
Abstract summary: We present a novel Hybrid Graph-based approach for Table-Text QA.<n>We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets.<n>Our method achieves the best zero-shot performance on both datasets.
Score: 2.3759432635713895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-source text data in a zero-shot setting, yet exploration into multi-source Table-Text QA remains limited. In this paper, we present a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning. Our method constructs a unified Hybrid Graph from textual and tabular data, pruning information based on the input question to provide the LLM with relevant context concisely. We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets using state-of-the-art LLMs, including GPT-3.5, GPT-4, and LLaMA-3. Our method achieves the best zero-shot performance on both datasets, improving Exact Match scores by up to 10% on Hybrid-QA and 5.4% on OTT-QA. Moreover, our approach reduces token usage by up to 53% compared to the original context.

Related papers

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content. Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z)
Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering [16.790216473975146]
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning.<n>Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs.<n>We propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning.
arXiv Detail & Related papers (2024-12-28T13:13:33Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator. Key innovation in our method lies in the Synthesize Step-by-Step strategy. We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z)
Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data [29.07028542633284]
Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus. There is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems. In this paper, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data.
arXiv Detail & Related papers (2024-02-20T10:00:58Z)
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z)
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering. We transform the raw text into a graph structure to build connections between different factual sentences. We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z)
MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images [24.17147521556083]
In-context learning has become the most popular way to solve QA problems. We propose MMHQA-ICL framework for addressing this problems. We are the first to use end-to-end prompting method for this task.
arXiv Detail & Related papers (2023-09-09T13:35:01Z)
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions [50.114651561111245]
We propose IRCoT, a new approach for multi-step question answering. It interleaves retrieval with steps in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT.
arXiv Detail & Related papers (2022-12-20T18:26:34Z)
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents. This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models. We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z)
Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed. We conduct retrieval-centric mixed-modality synthetic pre-training. OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z)
Intermediate Training on Question Answering Datasets Improves Generative Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task. We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain. We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z)
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA. We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.