Related papers: Generative Trees: Adversarial and Copycat

Generative Trees: Adversarial and Copycat

URL: http://arxiv.org/abs/2201.11205v1
Date: Wed, 26 Jan 2022 22:02:43 GMT
Title: Generative Trees: Adversarial and Copycat
Authors: Richard Nock and Mathieu Guillame-Bert
Abstract summary: We exploit decades-old understanding of the supervised task's best components for DT induction. We introduce tree-based generative models, textitgenerative trees (GTs) We test our algorithms on tasks including fake/real distinction, training from fake data and missing data imputation.
Score: 26.09279398946235
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours to a large extent decision tree (DT)-based models. This paper proposes a new path forward for the generation of tabular data, exploiting decades-old understanding of the supervised task's best components for DT induction, from losses (properness), models (tree-based) to algorithms (boosting). The \textit{properness} condition on the supervised loss -- which postulates the optimality of Bayes rule -- leads us to a variational GAN-style loss formulation which is \textit{tight} when discriminators meet a calibration property trivially satisfied by DTs, and, under common assumptions about the supervised loss, yields "one loss to train against them all" for the generator: the $\chi^2$. We then introduce tree-based generative models, \textit{generative trees} (GTs), meant to mirror on the generative side the good properties of DTs for classifying tabular data, with a boosting-compliant \textit{adversarial} training algorithm for GTs. We also introduce \textit{copycat training}, in which the generator copies at run time the underlying tree (graph) of the discriminator DT and completes it for the hardest discriminative task, with boosting compliant convergence. We test our algorithms on tasks including fake/real distinction, training from fake data and missing data imputation. Each one of these tasks displays that GTs can provide comparatively simple -- and interpretable -- contenders to sophisticated state of the art methods for data generation (using neural network models) or missing data imputation (relying on multiple imputation by chained equations with complex tree-based modeling).

Related papers

Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs. We propose task-trees as basic learning instances to align task spaces on graphs. Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
Unmasking Trees for Tabular Data [0.0]
We present UnmaskingTrees, a simple method for tabular imputation (and generation) employing gradient-boosted decision trees. To solve the conditional generation subproblem, we propose BaltoBot, which fits a balanced tree of boosted tree classifiers. Unlike older methods, it requires no parametric assumption on the conditional distribution, accommodating features with multimodal distributions. We finally consider our two approaches as meta-algorithms, demonstrating in-context learning-based generative modeling with TabPFN.
arXiv Detail & Related papers (2024-07-08T04:15:43Z)
DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs [17.847551850315895]
This paper proposes a novel framework, called Dual-Pathrative Adversarial Network (DPGAN) DPGAN can deal simultaneously with missing data and avoid over-smoothing problems. Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms.
arXiv Detail & Related papers (2024-04-26T05:26:10Z)
Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data. The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z)
NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes. The efficient computation is enabled by a kernerlized Gumbel-Softmax operator. Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z)
Localized Contrastive Learning on Graphs [110.54606263711385]
We introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL) In spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
arXiv Detail & Related papers (2022-12-08T23:36:00Z)
Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z)
Convergent Boosted Smoothing for Modeling Graph Data with Tabular Node Features [46.052312251801]
We propose a framework for iterating boosting with graph propagation steps. Our approach is anchored in a principled meta loss function. Across a variety of non-iid graph datasets, our method achieves comparable or superior performance.
arXiv Detail & Related papers (2021-10-26T04:53:12Z)
OCT-GAN: Neural ODE-based Conditional Tabular GANs [8.062118111791495]
We introduce our generator and discriminator based on neural ordinary differential equations (NODEs) We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on.
arXiv Detail & Related papers (2021-05-31T13:58:55Z)
A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE. AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution. Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z)
PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation Networks for Incomplete Data [19.952411963344556]
PC-GAIN is a novel unsupervised missing data imputation method named PC-GAIN. We first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data. Then an auxiliary classifier is determined using the synthetic pseudo-labels.
arXiv Detail & Related papers (2020-11-16T08:08:26Z)
Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG) It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.