Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
- URL: http://arxiv.org/abs/2502.18237v1
- Date: Tue, 25 Feb 2025 14:20:05 GMT
- Title: Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
- Authors: Mihaela Cătălina Stoian, Eleonora Giunchiglia,
- Abstract summary: Disjunctive Refinement Layer (DRL) is a layer designed to enforce alignment of data limitation with background knowledge specified in user-defined constraints.<n>DRL is the first method able to automatically make deep learning models inherently compliant with constraints as expressive as quantifier-free linear formulas.
- Score: 4.956977275061968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic tabular data generation has traditionally been a challenging problem due to the high complexity of the underlying distributions that characterise this type of data. Despite recent advances in deep generative models (DGMs), existing methods often fail to produce realistic datapoints that are well-aligned with available background knowledge. In this paper, we address this limitation by introducing Disjunctive Refinement Layer (DRL), a novel layer designed to enforce the alignment of generated data with the background knowledge specified in user-defined constraints. DRL is the first method able to automatically make deep learning models inherently compliant with constraints as expressive as quantifier-free linear formulas, which can define non-convex and even disconnected spaces. Our experimental analysis shows that DRL not only guarantees constraint satisfaction but also improves efficacy in downstream tasks. Notably, when applied to DGMs that frequently violate constraints, DRL eliminates violations entirely. Further, it improves performance metrics by up to 21.4% in F1-score and 20.9% in Area Under the ROC Curve, thus demonstrating its practical impact on data generation.
Related papers
- Deep Generative Models with Hard Linear Equality Constraints [24.93865980946986]
We propose a probabilistically sound approach for enforcing the hard constraints into DGMs to generate constraint-compliant data.<n>We carry out experiments with various DGM model architectures over five image datasets and three scientific applications.<n>Ours not only guarantees the satisfaction of constraints in generation but also archives superior generative performance than the other methods across every benchmark.
arXiv Detail & Related papers (2025-02-08T02:53:32Z) - Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling [35.2859997591196]
offline reinforcement learning holds promise for scaling data-driven decision-making.
However, real-world data collected from sensors or humans often contains noise and errors.
Our study reveals that prior research falls short under data corruption when the dataset is limited.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Exploiting T-norms for Deep Learning in Autonomous Driving [60.205021207641174]
We show how it is possible to define memory-efficient t-norm-based losses, allowing for exploiting t-norms for the task of event detection in autonomous driving.
arXiv Detail & Related papers (2024-02-17T18:51:21Z) - How Realistic Is Your Synthetic Data? Constraining Deep Generative
Models for Tabular Data [57.97035325253996]
We show how Constrained Deep Generative Models (C-DGMs) can be transformed into realistic synthetic data models.
C-DGMs are able to exploit the background knowledge expressed by the constraints to outperform their standard counterparts.
arXiv Detail & Related papers (2024-02-07T13:22:05Z) - Disparate Impact on Group Accuracy of Linearization for Private Inference [48.27026603581436]
We show that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups.
We also show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.
arXiv Detail & Related papers (2024-02-06T01:56:29Z) - Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Dual Generator Offline Reinforcement Learning [90.05278061564198]
In offline RL, constraining the learned policy to remain close to the data is essential.
In practice, GAN-based offline RL methods have not performed as well as alternative approaches.
We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint.
arXiv Detail & Related papers (2022-11-02T20:25:18Z) - Robust Offline Reinforcement Learning with Gradient Penalty and
Constraint Relaxation [38.95482624075353]
We introduce gradient penalty over the learned value function to tackle the exploding Q-functions.
We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation.
Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods.
arXiv Detail & Related papers (2022-10-19T11:22:36Z) - Improving Generalization via Uncertainty Driven Perturbations [107.45752065285821]
We consider uncertainty-driven perturbations of the training data points.
Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary.
We show that UDP is guaranteed to achieve the robustness margin decision on linear models.
arXiv Detail & Related papers (2022-02-11T16:22:08Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.