Related papers: Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA

Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA

URL: http://arxiv.org/abs/2210.05197v1
Date: Tue, 11 Oct 2022 07:04:39 GMT
Title: Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA
Authors: Junjie Huang, Wanjun Zhong, Qian Liu, Ming Gong, Daxin Jiang and Nan Duan
Abstract summary: An optimized OpenQA Table-Text Retriever (OTTeR) is proposed. We conduct retrieval-centric mixed-modality synthetic pre-training. OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
Score: 85.17249272519626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieving evidences from tabular and textual resources is essential for open-domain question answering (OpenQA), which provides more comprehensive information. However, training an effective dense table-text retriever is difficult due to the challenges of table-text discrepancy and data sparsity problem. To address the above challenges, we introduce an optimized OpenQA Table-Text Retriever (OTTeR) to jointly retrieve tabular and textual evidences. Firstly, we propose to enhance mixed-modality representation learning via two mechanisms: modality-enhanced representation and mixed-modality negative sampling strategy. Secondly, to alleviate data sparsity problem and enhance the general retrieval ability, we conduct retrieval-centric mixed-modality synthetic pre-training. Experimental results demonstrate that OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset. Comprehensive analyses examine the effectiveness of all the proposed mechanisms. Besides, equipped with OTTeR, our OpenQA system achieves the state-of-the-art result on the downstream QA task, with 10.1\% absolute improvement in terms of the exact match over the previous best system. \footnote{All the code and data are available at \url{https://github.com/Jun-jie-Huang/OTTeR}.}

Related papers

TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables [5.365164774382722]
We introduce TalentMine, a novel framework that transforms extracted tables into semantically enriched representations.<n> TalentMine achieves 100% accuracy in query answering tasks compared to 0% for standard AWS Textract extraction.<n>Our comparative analysis also reveals that the Claude v3 Haiku model achieves optimal performance for talent management applications.
arXiv Detail & Related papers (2025-06-22T22:17:42Z)
Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering [3.6799953119508735]
We propose a novel approach to multimodal textbook question answering by introducing a mechanism for enhancing semantic representations.<n>Our model, Joint Embedding Training With Ranking Supervision for Textbook Question Answering (JETRTQA), is a multimodal learning framework built on a retriever-generator architecture.<n>We evaluate our method on the CK12-QA dataset and demonstrate that it significantly improves the discrimination between informative and irrelevant documents.
arXiv Detail & Related papers (2025-05-17T13:23:54Z)
HD-RAG: Retrieval-Augmented Generation for Hybrid Documents Containing Text and Hierarchical Tables [2.915799083273604]
We introduce HD-RAG, a novel framework that incorporates a row-and-column level table representation.<n>We conduct comprehensive experiments with DocRAGLib, showing that HD-RAG outperforms existing baselines in both retrieval accuracy and QA performance.
arXiv Detail & Related papers (2025-04-13T13:02:33Z)
GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval [56.610806615527885]
This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video. By adaptively segmenting videos into short clips and employing zero-shot captioning, GQE enriches the training dataset with comprehensive scene descriptions. GQE achieves state-of-the-art performance on several benchmarks, including MSR-VTT, MSVD, LSMDC, and VATEX.
arXiv Detail & Related papers (2024-08-14T01:24:09Z)
On the Robustness of Language Models for Tabular Question Answering [7.486549276995143]
Large Language Models (LLMs) have been shown to tackle table comprehension tasks without specific training. We evaluate the robustness of LLMs on Wikipedia-based textbfWTQ, financial report-based textbfTAT-QA, and scientific claims-based textbfSCITAB, TQA datasets.
arXiv Detail & Related papers (2024-06-18T15:41:15Z)
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA. Every question requires the integration of information from both the table and the sub-graph to be answered. We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z)
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction [36.915250638481986]
We introduce LiveSum, a new benchmark dataset for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art Large Language Models on this task in both fine-tuning and zero-shot settings. We additionally propose a novel pipeline called $T3$(Text-Tuple-Table) to improve their performances.
arXiv Detail & Related papers (2024-04-22T14:31:28Z)
MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering [3.1651118728570635]
In today's fast-paced industry, professionals face the challenge of summarizing a large number of documents and extracting vital information from them on a daily basis. To address this challenge, the approach of Table Question Answering (QA) has been developed to extract the relevant information. Recent advancements in Large Language Models (LLMs) have opened up new possibilities for extracting information from tabular data using prompts.
arXiv Detail & Related papers (2024-03-28T03:14:18Z)
Beyond Extraction: Contextualising Tabular Data for Efficient Summarisation by Language Models [0.0]
The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents. This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
arXiv Detail & Related papers (2024-01-04T16:16:14Z)
End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries. We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion. We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA. We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)
Structured Multimodal Attentions for TextVQA [57.71060302874151]
We propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above. SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then designs a multimodal graph attention network to reason over it. Our proposed model outperforms the SoTA models on TextVQA dataset and two tasks of ST-VQA dataset among all models except pre-training based TAP.
arXiv Detail & Related papers (2020-06-01T07:07:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.