Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
- URL: http://arxiv.org/abs/2502.04055v1
- Date: Thu, 06 Feb 2025 13:13:26 GMT
- Title: Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
- Authors: Yunbo Long, Liming Xu, Alexandra Brintrup,
- Abstract summary: This paper proposes three evaluation metrics designed to assess the preservation of logical relationships.
We validate these metrics by assessing the performance of both classical and state-of-the-art generation methods on a real-world industrial dataset.
- Score: 49.898152180805454
- License:
- Abstract: Current evaluations of synthetic tabular data mainly focus on how well joint distributions are modeled, often overlooking the assessment of their effectiveness in preserving realistic event sequences and coherent entity relationships across columns.This paper proposes three evaluation metrics designed to assess the preservation of logical relationships among columns in synthetic tabular data. We validate these metrics by assessing the performance of both classical and state-of-the-art generation methods on a real-world industrial dataset.Experimental results reveal that existing methods often fail to rigorously maintain logical consistency (e.g., hierarchical relationships in geography or organization) and dependencies (e.g., temporal sequences or mathematical relationships), which are crucial for preserving the fine-grained realism of real-world tabular data. Building on these insights, this study also discusses possible pathways to better capture logical relationships while modeling the distribution of synthetic tabular data.
Related papers
- Benchmarking the Fidelity and Utility of Synthetic Relational Data [1.024113475677323]
We review related work on relational data synthesis, common benchmarking datasets, and approaches to measuring the fidelity and utility of synthetic data.
We combine the best practices and a novel robust detection approach into a benchmarking tool and use it to compare six methods.
For utility, we typically observe moderate correlation between real and synthetic data for both model predictive performance and feature importance.
arXiv Detail & Related papers (2024-10-04T13:23:45Z) - Preserving logical and functional dependencies in synthetic tabular data [0.0]
We introduce the notion of logical dependencies among the attributes in this article.
We also provide a measure to quantify logical dependencies among attributes in tabular data.
We demonstrate that currently available synthetic data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets.
arXiv Detail & Related papers (2024-09-26T09:51:07Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction [121.65152276851619]
We show that semantic correlations between relations are inherently edge-level and entity-independent.
We propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations.
To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph.
arXiv Detail & Related papers (2023-09-20T08:11:58Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Realistic Counterfactual Explanations by Learned Relations [0.0]
We propose a novel approach to realistic counterfactual explanations that preserve relationships between data attributes.
The model directly learns the relationships by a variational auto-encoder without domain knowledge and then learns to disturb the latent space accordingly.
arXiv Detail & Related papers (2022-02-15T12:33:51Z) - Link Prediction on N-ary Relational Data Based on Relatedness Evaluation [61.61555159755858]
We propose a method called NaLP to conduct link prediction on n-ary relational data.
We represent each n-ary relational fact as a set of its role and role-value pairs.
Experimental results validate the effectiveness and merits of the proposed methods.
arXiv Detail & Related papers (2021-04-21T09:06:54Z) - Mining Feature Relationships in Data [0.0]
Feature relationship mining (FRM) uses a genetic programming approach to automatically discover symbolic relationships between continuous or categorical features in data.
Our proposed approach is the first such symbolic approach with the goal of explicitly discovering relationships between features.
Empirical testing on a variety of real-world datasets shows the proposed method is able to find high-quality, simple feature relationships.
arXiv Detail & Related papers (2021-02-02T07:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.