Text-to-Table: A New Way of Information Extraction
- URL: http://arxiv.org/abs/2109.02707v1
- Date: Mon, 6 Sep 2021 19:35:46 GMT
- Title: Text-to-Table: A New Way of Information Extraction
- Authors: Xueqing Wu, Jiacheng Zhang, and Hang Li
- Abstract summary: We study a new problem setting of information extraction (IE), referred to as text-to-table.
In text-to-table, given a text, one creates a table or several tables expressing the main content of the text.
We make use of four existing table-to-text datasets in our experiments on text-to-table.
- Score: 8.326657025342042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a new problem setting of information extraction (IE), referred to as
text-to-table, which can be viewed as an inverse problem of the well-studied
table-to-text. In text-to-table, given a text, one creates a table or several
tables expressing the main content of the text, while the model is learned from
text-table pair data. The problem setting differs from those of the existing
methods for IE. First, the extraction can be carried out from long texts to
large tables with complex structures. Second, the extraction is entirely
data-driven, and there is no need to explicitly define the schemas. As far as
we know, there has been no previous work that studies the problem. In this
work, we formalize text-to-table as a sequence-to-sequence (seq2seq) problem.
We first employ a seq2seq model fine-tuned from a pre-trained language model to
perform the task. We also develop a new method within the seq2seq approach,
exploiting two additional techniques in table generation: table constraint and
table relation embeddings. We make use of four existing table-to-text datasets
in our experiments on text-to-table. Experimental results show that the vanilla
seq2seq model can outperform the baseline methods of using relation extraction
and named entity extraction. The results also show that our method can further
boost the performances of the vanilla seq2seq model. We further discuss the
main challenges of the proposed task. The code and data will be made publicly
available.
Related papers
- Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Mixed-modality Representation Learning and Pre-training for Joint
Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed.
We conduct retrieval-centric mixed-modality synthetic pre-training.
OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Learning Better Representation for Tables by Self-Supervised Tasks [23.69766883380125]
We propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation.
We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news.
arXiv Detail & Related papers (2020-10-15T09:03:38Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.