DART: Open-Domain Structured Data Record to Text Generation
- URL: http://arxiv.org/abs/2007.02871v2
- Date: Mon, 12 Apr 2021 14:18:06 GMT
- Title: DART: Open-Domain Structured Data Record to Text Generation
- Authors: Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand
Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav
Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad
Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi
Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
- Abstract summary: We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
- Score: 91.23798751437835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present DART, an open domain structured DAta Record to Text generation
dataset with over 82k instances (DARTs). Data-to-Text annotations can be a
costly process, especially when dealing with tables which are the major source
of structured data and contain nontrivial structures. To this end, we propose a
procedure of extracting semantic triples from tables that encodes their
structures by exploiting the semantic dependencies among table headers and the
table title. Our dataset construction framework effectively merged
heterogeneous sources from open domain semantic parsing and dialogue-act-based
meaning representation tasks by utilizing techniques such as: tree ontology
annotation, question-answer pair to declarative sentence conversion, and
predicate unification, all with minimum post-editing. We present systematic
evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to
show that DART (1) poses new challenges to existing data-to-text datasets and
(2) facilitates out-of-domain generalization. Our data and code can be found at
https://github.com/Yale-LILY/dart.
Related papers
- SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations [15.873944819608434]
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions.
This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which integrates TAGs' structural and semantic dimensions.
Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
arXiv Detail & Related papers (2024-05-27T03:40:16Z) - Unifying Structured Data as Graph for Data-to-Text Pre-Training [69.96195162337793]
Data-to-text (D2T) generation aims to transform structured data into natural language text.
Data-to-text pre-training has proved to be powerful in enhancing D2T generation.
We propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer.
arXiv Detail & Related papers (2024-01-02T12:23:49Z) - Enhancing Open-Domain Table Question Answering via Syntax- and
Structure-aware Dense Retrieval [21.585255812861632]
Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables.
Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval.
We propose a syntax- and structure-aware retrieval method for the open-domain table QA task.
arXiv Detail & Related papers (2023-09-19T10:40:09Z) - TransDocAnalyser: A Framework for Offline Semi-structured Handwritten
Document Analysis in the Legal Domain [3.5018563401895455]
We build the first semi-structured document analysis dataset in the legal domain.
This dataset combines a wide variety of handwritten text with printed text.
We propose an end-to-end framework for offline processing of handwritten semi-structured documents.
arXiv Detail & Related papers (2023-06-03T15:56:30Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - A Tale of Two Linkings: Dynamically Gating between Schema Linking and
Structural Linking for Text-to-SQL Parsing [25.81069211061945]
In Text-to- semantic parsing, selecting the correct entities for the generatedsql query is both crucial and challenging.
We two linking processes to address this challenge: schema linking which links explicit NL mentions to the database and structural linking which links the entities in the outputsql with their structural relationships in the database schema.
Integrating the proposed method with two graph neural network-based semantics together with BERT representations demonstrates substantial gains in parsing accuracy on the challenging Spider dataset.
arXiv Detail & Related papers (2020-09-30T17:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.