Beyond Extraction: Contextualising Tabular Data for Efficient
Summarisation by Language Models
- URL: http://arxiv.org/abs/2401.02333v3
- Date: Sat, 10 Feb 2024 12:35:22 GMT
- Title: Beyond Extraction: Contextualising Tabular Data for Efficient
Summarisation by Language Models
- Authors: Uday Allu, Biddwan Ahmed, Vishesh Tripathi
- Abstract summary: The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents.
This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The conventional use of the Retrieval-Augmented Generation (RAG) architecture
has proven effective for retrieving information from diverse documents.
However, challenges arise in handling complex table queries, especially within
PDF documents containing intricate tabular structures.This research introduces
an innovative approach to enhance the accuracy of complex table queries in
RAG-based systems. Our methodology involves storing PDFs in the retrieval
database and extracting tabular content separately. The extracted tables
undergo a process of context enrichment, concatenating headers with
corresponding values. To ensure a comprehensive understanding of the enriched
data, we employ a fine-tuned version of the Llama-2-chat language model for
summarisation within the RAG architecture. Furthermore, we augment the tabular
data with contextual sense using the ChatGPT 3.5 API through a one-shot prompt.
This enriched data is then fed into the retrieval database alongside other
PDFs. Our approach aims to significantly improve the precision of complex table
queries, offering a promising solution to a longstanding challenge in
information retrieval.
Related papers
- Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables.
TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We propose a new pipeline that retrieves relevant data and context, selects an efficient schema, and synthesizes correct and efficient queries.
Our method achieves new state-of-the-art performance on the cross-domain challenging BIRD dataset.
arXiv Detail & Related papers (2024-05-27T01:54:16Z) - QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries.
We propose a novel method to address these limitations by introducing query-focused multi-table summarization.
Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z) - Decomposing Complex Queries for Tip-of-the-tongue Retrieval [72.07449449115167]
Complex queries describe content elements (e.g., book characters or events), information beyond the document text.
This retrieval setting, called tip of the tongue (TOT), is especially challenging for models reliant on lexical and semantic overlap between query and document text.
We introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results.
arXiv Detail & Related papers (2023-05-24T11:43:40Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Query-Specific Knowledge Graphs for Complex Finance Topics [6.599344783327053]
We focus on the CODEC dataset, where domain experts create challenging questions.
We show that state-of-the-art ranking systems have headroom for improvement.
We demonstrate that entity and document relevance are positively correlated.
arXiv Detail & Related papers (2022-11-08T10:21:13Z) - Mixed-modality Representation Learning and Pre-training for Joint
Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed.
We conduct retrieval-centric mixed-modality synthetic pre-training.
OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.