Rows from Many Sources: Enriching row completions from Wikidata with a
pre-trained Language Model
- URL: http://arxiv.org/abs/2204.07014v1
- Date: Thu, 14 Apr 2022 15:11:52 GMT
- Title: Rows from Many Sources: Enriching row completions from Wikidata with a
pre-trained Language Model
- Authors: Carina Negreanu, Alperen Karaoglu, Jack Williams, Shuang Chen, Daniel
Fabian, Andrew Gordon, Chin-Yew Lin
- Abstract summary: We present state-of-the-art results for subject suggestion and gap filling measured on a standard benchmark (WikiTables)
We interpret the table using the knowledge base to suggest new rows and generate metadata like headers through property linking.
We synthesize additional rows using free text generation via GPT-3, and crucially, we exploit the metadata we interpret to produce better prompts for text generation.
- Score: 9.084045516880444
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Row completion is the task of augmenting a given table of text and numbers
with additional, relevant rows. The task divides into two steps: subject
suggestion, the task of populating the main column; and gap filling, the task
of populating the remaining columns. We present state-of-the-art results for
subject suggestion and gap filling measured on a standard benchmark
(WikiTables). Our idea is to solve this task by harmoniously combining
knowledge base table interpretation and free text generation. We interpret the
table using the knowledge base to suggest new rows and generate metadata like
headers through property linking. To improve candidate diversity, we synthesize
additional rows using free text generation via GPT-3, and crucially, we exploit
the metadata we interpret to produce better prompts for text generation.
Finally, we verify that the additional synthesized content can be linked to the
knowledge base or a trusted web source such as Wikipedia.
Related papers
- Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction [36.915250638481986]
We introduce LiveSum, a new benchmark dataset for generating summary tables of competitions based on real-time commentary texts.
We evaluate the performances of state-of-the-art Large Language Models on this task in both fine-tuning and zero-shot settings.
We additionally propose a novel pipeline called $T3$(Text-Tuple-Table) to improve their performances.
arXiv Detail & Related papers (2024-04-22T14:31:28Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations [12.394777121890925]
This paper revisits and substantially extends previous dataset creation efforts.
We show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set.
arXiv Detail & Related papers (2021-10-09T09:15:05Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.