Row Conditional-TGAN for generating synthetic relational databases
- URL: http://arxiv.org/abs/2211.07588v1
- Date: Mon, 14 Nov 2022 18:14:18 GMT
- Title: Row Conditional-TGAN for generating synthetic relational databases
- Authors: Mohamed Gueye, Yazid Attabi, Maxime Dumas
- Abstract summary: We propose the Row-Tabular Generative Adversarial Network (RC-TGAN) to support modeling and synthesizing relational databases.
The RC-TGAN models relationship information between tables by incorporating conditional data of parent rows into the design of the child table's GAN.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Besides reproducing tabular data properties of standalone tables, synthetic
relational databases also require modeling the relationships between related
tables. In this paper, we propose the Row Conditional-Tabular Generative
Adversarial Network (RC-TGAN), a novel generative adversarial network (GAN)
model that extends the tabular GAN to support modeling and synthesizing
relational databases. The RC-TGAN models relationship information between
tables by incorporating conditional data of parent rows into the design of the
child table's GAN. We further extend the RC-TGAN to model the influence that
grandparent table rows may have on their grandchild rows, in order to prevent
the loss of this connection when the rows of the parent table fail to transfer
this relationship information. The experimental results, using eight real
relational databases, show significant improvements in the quality of the
synthesized relational databases when compared to the benchmark system,
demonstrating the effectiveness of the RC-TGAN in preserving relationships
between tables of the original database.
Related papers
- Enhancing Table Representations with LLM-powered Synthetic Data Generation [0.565395466029518]
We formulate a clear definition of table similarity in the context of data transformation activities within data-driven enterprises.
We propose a novel synthetic data generation pipeline that harnesses the code generation and data manipulation capabilities of Large Language Models.
We demonstrate that the synthetic data generated by our pipeline aligns with our proposed definition of table similarity and significantly enhances table representations.
arXiv Detail & Related papers (2024-11-04T19:54:07Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.
TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - CTSyn: A Foundational Model for Cross Tabular Data Generation [9.568990880984813]
Cross-Table Synthesizer (CTSyn) is a diffusion-based foundational model tailored for tabular data generation.
CTSyn significantly outperforms existing table synthesizers in utility and diversity.
It also uniquely enhances performances of downstream machine learning beyond what is achievable with real data.
arXiv Detail & Related papers (2024-06-07T04:04:21Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - REaLTabFormer: Generating Realistic Relational and Tabular Data using
Transformers [0.0]
We introduce REaLTabFormer (Realistic and Tabular Transformer), a synthetic data generation model.
It first creates a parent table using an autoregressive GPT-2 model, then generates the relational dataset conditioned on the parent table using a sequence-to-sequence model.
Experiments using real-world datasets show that REaLTabFormer captures the relational structure better than a model baseline.
arXiv Detail & Related papers (2023-02-04T00:32:50Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - HittER: Hierarchical Transformers for Knowledge Graph Embeddings [85.93509934018499]
We propose Hitt to learn representations of entities and relations in a complex knowledge graph.
Experimental results show that Hitt achieves new state-of-the-art results on multiple link prediction.
We additionally propose a simple approach to integrate Hitt into BERT and demonstrate its effectiveness on two Freebase factoid answering datasets.
arXiv Detail & Related papers (2020-08-28T18:58:15Z) - Relation of the Relations: A New Paradigm of the Relation Extraction
Problem [52.21210549224131]
We propose a new paradigm of Relation Extraction (RE) that considers as a whole the predictions of all relations in the same context.
We develop a data-driven approach that does not require hand-crafted rules but learns by itself the relation of relations (RoR) using Graph Neural Networks and a relation matrix transformer.
Experiments show that our model outperforms the state-of-the-art approaches by +1.12% on the ACE05 dataset and +2.55% on SemEval 2018 Task 7.2.
arXiv Detail & Related papers (2020-06-05T22:25:27Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.