Generative Trees: Adversarial and Copycat
- URL: http://arxiv.org/abs/2201.11205v1
- Date: Wed, 26 Jan 2022 22:02:43 GMT
- Title: Generative Trees: Adversarial and Copycat
- Authors: Richard Nock and Mathieu Guillame-Bert
- Abstract summary: We exploit decades-old understanding of the supervised task's best components for DT induction.
We introduce tree-based generative models, textitgenerative trees (GTs)
We test our algorithms on tasks including fake/real distinction, training from fake data and missing data imputation.
- Score: 26.09279398946235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Generative Adversarial Networks (GANs) achieve spectacular results on
unstructured data like images, there is still a gap on tabular data, data for
which state of the art supervised learning still favours to a large extent
decision tree (DT)-based models. This paper proposes a new path forward for the
generation of tabular data, exploiting decades-old understanding of the
supervised task's best components for DT induction, from losses (properness),
models (tree-based) to algorithms (boosting). The \textit{properness} condition
on the supervised loss -- which postulates the optimality of Bayes rule --
leads us to a variational GAN-style loss formulation which is \textit{tight}
when discriminators meet a calibration property trivially satisfied by DTs,
and, under common assumptions about the supervised loss, yields "one loss to
train against them all" for the generator: the $\chi^2$. We then introduce
tree-based generative models, \textit{generative trees} (GTs), meant to mirror
on the generative side the good properties of DTs for classifying tabular data,
with a boosting-compliant \textit{adversarial} training algorithm for GTs. We
also introduce \textit{copycat training}, in which the generator copies at run
time the underlying tree (graph) of the discriminator DT and completes it for
the hardest discriminative task, with boosting compliant convergence. We test
our algorithms on tasks including fake/real distinction, training from fake
data and missing data imputation. Each one of these tasks displays that GTs can
provide comparatively simple -- and interpretable -- contenders to
sophisticated state of the art methods for data generation (using neural
network models) or missing data imputation (relying on multiple imputation by
chained equations with complex tree-based modeling).
Related papers
- Unmasking Trees for Tabular Data [0.0]
We present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees.
To solve the conditional generation subproblem, we propose BaltoBot, which fits a balanced tree of boosted tree classifiers.
Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions.
We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
arXiv Detail & Related papers (2024-07-08T04:15:43Z) - DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs [17.847551850315895]
This paper proposes a novel framework, called Dual-Pathrative Adversarial Network (DPGAN)
DPGAN can deal simultaneously with missing data and avoid over-smoothing problems.
Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms.
arXiv Detail & Related papers (2024-04-26T05:26:10Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Localized Contrastive Learning on Graphs [110.54606263711385]
We introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL)
In spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
arXiv Detail & Related papers (2022-12-08T23:36:00Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Convergent Boosted Smoothing for Modeling Graph Data with Tabular Node
Features [46.052312251801]
We propose a framework for iterating boosting with graph propagation steps.
Our approach is anchored in a principled meta loss function.
Across a variety of non-iid graph datasets, our method achieves comparable or superior performance.
arXiv Detail & Related papers (2021-10-26T04:53:12Z) - OCT-GAN: Neural ODE-based Conditional Tabular GANs [8.062118111791495]
We introduce our generator and discriminator based on neural ordinary differential equations (NODEs)
We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on.
arXiv Detail & Related papers (2021-05-31T13:58:55Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation
Networks for Incomplete Data [19.952411963344556]
PC-GAIN is a novel unsupervised missing data imputation method named PC-GAIN.
We first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data.
Then an auxiliary classifier is determined using the synthetic pseudo-labels.
arXiv Detail & Related papers (2020-11-16T08:08:26Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.