Generative Forests
- URL: http://arxiv.org/abs/2308.03648v1
- Date: Mon, 7 Aug 2023 14:58:53 GMT
- Title: Generative Forests
- Authors: Richard Nock and Mathieu Guillame-Bert
- Abstract summary: We introduce new tree-based generative models convenient for density modeling and data generation.
We also introduce a training algorithm which simplifies the training setting of previous approaches.
Experiments are provided on missing data imputation and comparing generated data to real data, displaying the quality of the results.
- Score: 26.09279398946235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data represents one of the most prevalent form of data. When it comes
to data generation, many approaches would learn a density for the data
generation process, but would not necessarily end up with a sampler, even less
so being exact with respect to the underlying density. A second issue is on
models: while complex modeling based on neural nets thrives in image or text
generation (etc.), less is known for powerful generative models on tabular
data. A third problem is the visible chasm on tabular data between training
algorithms for supervised learning with remarkable properties (e.g. boosting),
and a comparative lack of guarantees when it comes to data generation. In this
paper, we tackle the three problems, introducing new tree-based generative
models convenient for density modeling and tabular data generation that improve
on modeling capabilities of recent proposals, and a training algorithm which
simplifies the training setting of previous approaches and displays
boosting-compliant convergence. This algorithm has the convenient property to
rely on a supervised training scheme that can be implemented by a few tweaks to
the most popular induction scheme for decision tree induction with two classes.
Experiments are provided on missing data imputation and comparing generated
data to real data, displaying the quality of the results obtained by our
approach, in particular against state of the art.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - Data Selection: A General Principle for Building Small Interpretable Models [0.0]
We present convincing empirical evidence for an effective and general strategy for building accurate small models.
We apply it to the tasks of building cluster explanation trees, (2) prototype-based classification, and (3) classification using Random Forests.
In the final task involving Random Forests, the strategy is shown to be effective even when model size comprises of more than one factor: number of trees and their maximum depth.
arXiv Detail & Related papers (2022-10-08T05:16:49Z) - Smooth densities and generative modeling with unsupervised random
forests [1.433758865948252]
An important application for density estimators is synthetic data generation.
We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints.
We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z) - Multi network InfoMax: A pre-training method involving graph
convolutional networks [0.0]
This paper presents a pre-training method involving graph convolutional/neural networks (GCNs/GNNs)
The learned high-level graph latent representations help increase performance for downstream graph classification tasks.
We apply our method to a neuroimaging dataset for classifying subjects into healthy control (HC) and schizophrenia (SZ) groups.
arXiv Detail & Related papers (2021-11-01T21:53:20Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.