Related papers: Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction

Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction

URL: http://arxiv.org/abs/2409.14192v2
Date: Tue, 29 Oct 2024 21:10:59 GMT
Title: Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction
Authors: Hossein Sholehrasa, Sanaz Saki Norouzi, Pascal Hitzler, Majid Jaberi-Douraki,
Abstract summary: This paper proposes a novel approach that extracts triples straightforward from tabular data and integrates it with a retrieval-augmented generation (RAG) model to enhance the accuracy, coherence, and contextual richness of responses generated by a fine-tuned GPT-3.5-turbo-0125 model. Our approach significantly outperforms existing baselines on the FeTaQA dataset, particularly excelling in Sacre-BLEU and ROUGE metrics.
Score: 1.0968343822308813
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integrating structured knowledge from tabular formats poses significant challenges within natural language processing (NLP), mainly when dealing with complex, semi-structured tables like those found in the FeTaQA dataset. These tables require advanced methods to interpret and generate meaningful responses accurately. Traditional approaches, such as SQL and SPARQL, often fail to fully capture the semantics of such data, especially in the presence of irregular table structures like web tables. This paper addresses these challenges by proposing a novel approach that extracts triples straightforward from tabular data and integrates it with a retrieval-augmented generation (RAG) model to enhance the accuracy, coherence, and contextual richness of responses generated by a fine-tuned GPT-3.5-turbo-0125 model. Our approach significantly outperforms existing baselines on the FeTaQA dataset, particularly excelling in Sacre-BLEU and ROUGE metrics. It effectively generates contextually accurate and detailed long-form answers from tables, showcasing its strength in complex data interpretation.

Related papers

Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding [42.841205217768106]
"Tree-of-Table" is a novel approach designed to enhance LLMs' reasoning capabilities over large and complex tables. We show that Tree-of-Table sets a new benchmark with superior performance, showcasing remarkable efficiency and generalization capabilities in large-scale table reasoning.
arXiv Detail & Related papers (2024-11-13T11:02:04Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering [11.214912072391108]
Real-world datasets often feature a vast array of attributes and complex values. Traditional methods cannot fully relay the datasets size and complexity to the Large Language Models. We propose a novel framework that leverages Full-Text Search (FTS) on the input table.
arXiv Detail & Related papers (2024-08-22T13:13:06Z)
On the Robustness of Language Models for Tabular Question Answering [7.486549276995143]
Large Language Models (LLMs) have been shown to tackle table comprehension tasks without specific training. We evaluate the robustness of LLMs on Wikipedia-based textbfWTQ, financial report-based textbfTAT-QA, and scientific claims-based textbfSCITAB, TQA datasets.
arXiv Detail & Related papers (2024-06-18T15:41:15Z)
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. TACT contains challenging instructions that demand stitching information scattered across one or more texts. We construct this dataset by leveraging an existing dataset of texts and their associated tables. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z)
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA. Every question requires the integration of information from both the table and the sub-graph to be answered. We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z)
QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries. We propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z)
Beyond Extraction: Contextualising Tabular Data for Efficient Summarisation by Language Models [0.0]
The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents. This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
arXiv Detail & Related papers (2024-01-04T16:16:14Z)
Large Language Models are Complex Table Parsers [26.66460264175336]
We propose to incorporate GPT-3.5 to address the challenges posed by Complex Table QA. Specifically, we encode each cell's hierarchical structure, position information and content as datasets. By enhancing the prompt template with an explanatory description of the meaning of each task, we effectively improve the hierarchical awareness structure capability.
arXiv Detail & Related papers (2023-12-13T01:34:42Z)
Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?" Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z)
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing. We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar. To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.