TabStruct: Measuring Structural Fidelity of Tabular Data
- URL: http://arxiv.org/abs/2509.11950v1
- Date: Mon, 15 Sep 2025 14:08:20 GMT
- Title: TabStruct: Measuring Structural Fidelity of Tabular Data
- Authors: Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik,
- Abstract summary: We introduce a new evaluation metric, $textbfglobal utility$, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures.<n>We also present the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results.
- Score: 28.606994119562163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, $\textbf{global utility}$, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present $\textbf{TabStruct}$, a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results. Code is available at https://github.com/SilenceX12138/TabStruct.
Related papers
- It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks [87.7937890373758]
Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation.<n>We introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks.<n>We propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels.
arXiv Detail & Related papers (2026-02-12T16:31:01Z) - Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery [4.8369208007394215]
We propose Structural Compositional Networks ( StructuralCFN), a novel architecture that imposes a Relation-Aware Inductive Bias via a differentiable structural prior.<n>Our framework enables Structured Knowledge Integration, allowing domain-specific relational priors to be injected directly into the architecture to guide discovery.<n>We evaluate StructuralCFN across a rigorous 10-fold cross-validation suite on 18 benchmarks, demonstrating statistically significant improvements.
arXiv Detail & Related papers (2026-01-27T20:20:07Z) - Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation [11.450834626205676]
Table-BiEval is a novel approach based on a human-free, self-supervised evaluation framework.<n>It calculates Content Semantic Accuracy and Normalized Tree Edit Distance to decouple structure from content.<n>Results reveal substantial variability, highlighting that mid-sized models can surprisingly outperform larger counterparts in structural efficiency.
arXiv Detail & Related papers (2026-01-09T07:38:27Z) - TabReX : Tabular Referenceless eXplainable Evaluation [15.411207072791806]
TabReX is a reference-less, property-driven framework for evaluating tables generated by large language models.<n>It computes interpretable, rubric-aware scores that quantify structural and factual fidelity.<n>To asses robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types.
arXiv Detail & Related papers (2025-12-17T19:20:20Z) - Structural Equation-VAE: Disentangled Latent Representations for Tabular Data [4.101599614979332]
We introduce SE-VAE (Structural Equation-Variational Autoencoder), a novel architecture that embeds measurement structure directly into the design of a variational autoencoder.<n>Inspired by structural equation modeling, SE-VAE aligns latent subspaces with known indicator groupings and introduces a global nuisance latent to isolate construct-specific confounding variation.<n> SE-VAE consistently outperforms alternatives in factor recovery, interpretability, and robustness to nuisance variation.
arXiv Detail & Related papers (2025-08-08T14:21:20Z) - AlphaFold Database Debiasing for Robust Inverse Folding [58.792020809180336]
We introduce a Debiasing Structure AutoEncoder (DeSAE) that learns to reconstruct native-like conformations from intentionally corrupted backbone geometries.<n>At inference, applying DeSAE to AFDB structures produces debiased structures that significantly improve inverse folding performance.
arXiv Detail & Related papers (2025-06-10T02:25:31Z) - How Well Does Your Tabular Generator Learn the Structure of Tabular Data? [10.974400005358193]
In this paper, we introduce TabStruct, a novel evaluation benchmark that positions structural fidelity as a core evaluation dimension.<n>We show that structural fidelity offers a task-independent, domain-agnostic evaluation dimension.
arXiv Detail & Related papers (2025-03-12T14:54:58Z) - StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs [78.84060166851805]
StructTest is a novel benchmark that evaluates large language models (LLMs) on their ability to follow compositional instructions and generate structured outputs.<n> Assessments are conducted deterministically using a rule-based evaluator, which can be easily extended to new tasks and datasets.<n>We demonstrate that StructTest remains challenging even for top-performing models like Deepseek-V3/R1 and GPT-4o.
arXiv Detail & Related papers (2024-12-23T22:08:40Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - Structured Evaluation of Synthetic Tabular Data [6.418460620178983]
Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns.
We propose an evaluation framework with a single, mathematical objective that posits that the synthetic data should be drawn from the same distribution as the observed data.
We evaluate structurally informed synthesizers and synthesizers powered by deep learning.
arXiv Detail & Related papers (2024-03-15T15:58:37Z) - Deconstructing Self-Supervised Monocular Reconstruction: The Design
Decisions that Matter [63.5550818034739]
This paper presents a framework to evaluate state-of-the-art contributions to self-supervised monocular depth estimation.
It includes pretraining, backbone, architectural design choices and loss functions.
We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset.
arXiv Detail & Related papers (2022-08-02T14:38:53Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.