Enhancing Open-Domain Table Question Answering via Syntax- and
Structure-aware Dense Retrieval
- URL: http://arxiv.org/abs/2309.10506v1
- Date: Tue, 19 Sep 2023 10:40:09 GMT
- Title: Enhancing Open-Domain Table Question Answering via Syntax- and
Structure-aware Dense Retrieval
- Authors: Nengzheng Jin, Dongfang Li, Junying Chen, Joanna Siebert, Qingcai Chen
- Abstract summary: Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables.
Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval.
We propose a syntax- and structure-aware retrieval method for the open-domain table QA task.
- Score: 21.585255812861632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain table question answering aims to provide answers to a question by
retrieving and extracting information from a large collection of tables.
Existing studies of open-domain table QA either directly adopt text retrieval
methods or consider the table structure only in the encoding layer for table
retrieval, which may cause syntactical and structural information loss during
table scoring. To address this issue, we propose a syntax- and structure-aware
retrieval method for the open-domain table QA task. It provides syntactical
representations for the question and uses the structural header and value
representations for the tables to avoid the loss of fine-grained syntactical
and structural information. Then, a syntactical-to-structural aggregator is
used to obtain the matching score between the question and a candidate table by
mimicking the human retrieval process. Experimental results show that our
method achieves the state-of-the-art on the NQ-tables dataset and overwhelms
strong baselines on a newly curated open-domain Text-to-SQL dataset.
Related papers
- Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z) - Beyond Extraction: Contextualising Tabular Data for Efficient
Summarisation by Language Models [0.0]
The conventional use of the Retrieval-Augmented Generation architecture has proven effective for retrieving information from diverse documents.
This research introduces an innovative approach to enhance the accuracy of complex table queries in RAG-based systems.
arXiv Detail & Related papers (2024-01-04T16:16:14Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - DrugEHRQA: A Question Answering Dataset on Structured and Unstructured
Electronic Health Records For Medicine Related Queries [7.507210439502174]
This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from an EHR.
Our dataset has medication-related queries, containing over 70,000 question-answer pairs.
arXiv Detail & Related papers (2022-05-03T03:50:50Z) - Representations for Question Answering from Documents with Tables and
Text [22.522986299412807]
We aim to improve question answering from tables by refining table representations based on information from surrounding text.
We also present an effective method to combine text and table-based predictions for question answering from full documents.
arXiv Detail & Related papers (2021-01-26T05:52:20Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.