Generative Forests
- URL: http://arxiv.org/abs/2308.03648v1
- Date: Mon, 7 Aug 2023 14:58:53 GMT
- Title: Generative Forests
- Authors: Richard Nock and Mathieu Guillame-Bert
- Abstract summary: We introduce new tree-based generative models convenient for density modeling and data generation.
We also introduce a training algorithm which simplifies the training setting of previous approaches.
Experiments are provided on missing data imputation and comparing generated data to real data, displaying the quality of the results.
- Score: 26.09279398946235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data represents one of the most prevalent form of data. When it comes
to data generation, many approaches would learn a density for the data
generation process, but would not necessarily end up with a sampler, even less
so being exact with respect to the underlying density. A second issue is on
models: while complex modeling based on neural nets thrives in image or text
generation (etc.), less is known for powerful generative models on tabular
data. A third problem is the visible chasm on tabular data between training
algorithms for supervised learning with remarkable properties (e.g. boosting),
and a comparative lack of guarantees when it comes to data generation. In this
paper, we tackle the three problems, introducing new tree-based generative
models convenient for density modeling and tabular data generation that improve
on modeling capabilities of recent proposals, and a training algorithm which
simplifies the training setting of previous approaches and displays
boosting-compliant convergence. This algorithm has the convenient property to
rely on a supervised training scheme that can be implemented by a few tweaks to
the most popular induction scheme for decision tree induction with two classes.
Experiments are provided on missing data imputation and comparing generated
data to real data, displaying the quality of the results obtained by our
approach, in particular against state of the art.
Related papers
- Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning [12.443476695459553]
We propose a novel duality fine-tuning method by formally defining the probabilistic duality constraints between key information prediction and headline generation tasks.
The proposed method can capture more information from limited data, build connections between separate tasks, and is suitable for less-data constrained generation tasks.
We conduct extensive experiments to demonstrate that our method is effective and efficient to achieve improved performance in terms of language modeling metric and informativeness correctness metric on two public datasets.
arXiv Detail & Related papers (2022-10-10T07:59:36Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Generative Trees: Adversarial and Copycat [26.09279398946235]
We exploit decades-old understanding of the supervised task's best components for DT induction.
We introduce tree-based generative models, textitgenerative trees (GTs)
We test our algorithms on tasks including fake/real distinction, training from fake data and missing data imputation.
arXiv Detail & Related papers (2022-01-26T22:02:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.