Table Retrieval May Not Necessitate Table-specific Model Design
- URL: http://arxiv.org/abs/2205.09843v1
- Date: Thu, 19 May 2022 20:35:23 GMT
- Title: Table Retrieval May Not Necessitate Table-specific Model Design
- Authors: Zhiruo Wang, Zhengbao Jiang, Eric Nyberg, Graham Neubig
- Abstract summary: We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
- Score: 83.27735758203089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tables are an important form of structured data for both human and machine
readers alike, providing answers to questions that cannot, or cannot easily, be
found in texts. Recent work has designed special models and training paradigms
for table-related tasks such as table-based question answering and table
retrieval. Though effective, they add complexity in both modeling and data
acquisition compared to generic text solutions and obscure which elements are
truly beneficial. In this work, we focus on the task of table retrieval, and
ask: "is table-specific model design necessary for table retrieval, or can a
simpler text-based model be effectively used to achieve a similar result?"
First, we perform an analysis on a table-based portion of the Natural Questions
dataset (NQ-table), and find that structure plays a negligible role in more
than 70% of the cases. Based on this, we experiment with a general Dense
Passage Retriever (DPR) based on text and a specialized Dense Table Retriever
(DTR) that uses table-specific model designs. We find that DPR performs well
without any table-specific design and training, and even achieves superior
results compared to DTR when fine-tuned on properly linearized tables. We then
experiment with three modules to explicitly encode table structures, namely
auxiliary row/column embeddings, hard attention masks, and soft relation-based
attention biases. However, none of these yielded significant improvements,
suggesting that table-specific model design may not be necessary for table
retrieval.
Related papers
- KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z) - ReasTAP: Injecting Table Reasoning Skills During Pre-training via
Synthetic Reasoning Examples [15.212332890570869]
We develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design.
ReasTAP achieves new state-of-the-art performance on all benchmarks and delivers a significant improvement on low-resource setting.
arXiv Detail & Related papers (2022-10-22T07:04:02Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - TableFormer: Robust Transformer Modeling for Table-Text Encoding [18.00127368618485]
Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias.
In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer.
arXiv Detail & Related papers (2022-03-01T07:23:06Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.