Learning Better Representation for Tables by Self-Supervised Tasks
- URL: http://arxiv.org/abs/2010.07606v3
- Date: Tue, 30 Mar 2021 06:36:30 GMT
- Title: Learning Better Representation for Tables by Self-Supervised Tasks
- Authors: Liang Li, Can Ma, Yinliang Yue, Linjun Shou and Dayong Hu
- Abstract summary: We propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation.
We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news.
- Score: 23.69766883380125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table-to-text generation aims at automatically generating natural text to
help people to conveniently obtain the important information in tables.
Although neural models for table-to-text have achieved remarkable progress,
some problems still overlooked. The first is that the values recorded in many
tables are mostly numbers in practice. The existing approaches do not do
special treatment for these, and still regard these as words in natural
language text. Secondly, the target texts in training dataset may contain
redundant information or facts do not exist in the input tables. These may give
wrong supervision signals to some methods based on content selection and
planning and auxiliary supervision. To solve these problems, we propose two
self-supervised tasks, Number Ordering and Significance Ordering, to help to
learn better table representation. The former works on the column dimension to
help to incorporate the size property of numbers into table representation. The
latter acts on row dimension and help to learn a significance-aware table
representation. We test our methods on the widely used dataset ROTOWIRE which
consists of NBA game statistic and related news. The experimental results
demonstrate that the model trained together with these two self-supervised
tasks can generate text that contains more salient and well-organized facts,
even without modeling context selection and planning. And we achieve the
state-of-the-art performance on automatic metrics.
Related papers
- TDeLTA: A Light-weight and Robust Table Detection Method based on
Learning Text Arrangement [34.73880086005418]
We propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA.
To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables.
Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets.
arXiv Detail & Related papers (2023-12-18T09:18:43Z) - PixT3: Pixel-based Table-To-Text Generation [66.96636025277536]
We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations.
Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and superior to generators that operate solely on text.
arXiv Detail & Related papers (2023-11-16T11:32:47Z) - HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation [7.69801337810352]
We conduct parameter-efficient fine-tuning on the LLaMA2 model.
Our approach involves injecting reasoning information into the input by emphasizing table-specific row data.
On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results.
arXiv Detail & Related papers (2023-11-15T12:02:52Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Bridge the Gap between Language models and Tabular Understanding [99.88470271644894]
Table pretrain-then-finetune paradigm has been proposed and employed at a rapid pace after the success of pre-training in the natural language domain.
Despite the promising findings, there is an input gap between pre-training and fine-tuning phases.
We propose UTP, an approach that dynamically supports three types of multi-modal inputs: table-text, table, and text.
arXiv Detail & Related papers (2023-02-16T15:16:55Z) - Towards Table-to-Text Generation with Pretrained Language Model: A Table
Structure Understanding and Text Deliberating Approach [60.03002572791552]
We propose a table structure understanding and text deliberating approach, namely TASD.
Specifically, we devise a three-layered multi-head attention network to realize the table-structure-aware text generation model.
Our approach can generate faithful and fluent descriptive texts for different types of tables.
arXiv Detail & Related papers (2023-01-05T14:03:26Z) - TabText: A Flexible and Contextual Approach to Tabular Data
Representation [4.116980088382032]
TabText is a processing framework that extracts contextual information from tabular data structures.
We show that TabText improves the average and worst-case AUC performance of standard machine learning models by as much as 6%.
arXiv Detail & Related papers (2022-06-21T13:28:57Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.