A Lightweight Algorithm to Uncover Deep Relationships in Data Tables
- URL: http://arxiv.org/abs/2009.03358v1
- Date: Mon, 7 Sep 2020 18:25:15 GMT
- Title: A Lightweight Algorithm to Uncover Deep Relationships in Data Tables
- Authors: Jin Cao and Yibo Zhao and Linjun Zhang and Jason Li
- Abstract summary: We develop automated methods to uncover deep relationships in a single data table without expert or domain knowledge.
Our method can decompose a data table into layers of smaller tables, revealing its deep structure.
With our solution, data scientists will be provided with automatically generated, data-driven insights when exploring new data sets.
- Score: 16.829302261959924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many data we collect today are in tabular form, with rows as records and
columns as attributes associated with each record. Understanding the structural
relationship in tabular data can greatly facilitate the data science process.
Traditionally, much of this relational information is stored in table schema
and maintained by its creators, usually domain experts. In this paper, we
develop automated methods to uncover deep relationships in a single data table
without expert or domain knowledge. Our method can decompose a data table into
layers of smaller tables, revealing its deep structure. The key to our approach
is a computationally lightweight forward addition algorithm that we developed
to recursively extract the functional dependencies between table columns that
are scalable to tables with many columns. With our solution, data scientists
will be provided with automatically generated, data-driven insights when
exploring new data sets.
Related papers
- Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - BERT Meets Relational DB: Contextual Representations of Relational
Databases [4.029818252558553]
We address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.
We look into ways of using these attention-based model to learn embeddings for entities in the relational database.
arXiv Detail & Related papers (2021-04-30T11:23:26Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.