A Deep Generative Model for Feasible and Diverse Population Synthesis
- URL: http://arxiv.org/abs/2208.01403v1
- Date: Mon, 1 Aug 2022 05:02:02 GMT
- Title: A Deep Generative Model for Feasible and Diverse Population Synthesis
- Authors: Eui-Jin Kim and Prateek Bansal
- Abstract summary: This study proposes a novel method to minimize structural zeros while preserving sampling zeros.
Two regularizations are devised to customize the training of the DGM and applied to a generative adversarial network (GAN) and a variational autoencoder (VAE)
Results show that the proposed regularizations achieve considerable performance improvement in feasibility and diversity of the synthesized population over traditional models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An ideal synthetic population, a key input to activity-based models, mimics
the distribution of the individual- and household-level attributes in the
actual population. Since the entire population's attributes are generally
unavailable, household travel survey (HTS) samples are used for population
synthesis. Synthesizing population by directly sampling from HTS ignores the
attribute combinations that are unobserved in the HTS samples but exist in the
population, called 'sampling zeros'. A deep generative model (DGM) can
potentially synthesize the sampling zeros but at the expense of generating
'structural zeros' (i.e., the infeasible attribute combinations that do not
exist in the population). This study proposes a novel method to minimize
structural zeros while preserving sampling zeros. Two regularizations are
devised to customize the training of the DGM and applied to a generative
adversarial network (GAN) and a variational autoencoder (VAE). The adopted
metrics for feasibility and diversity of the synthetic population indicate the
capability of generating sampling and structural zeros -- lower structural
zeros and lower sampling zeros indicate the higher feasibility and the lower
diversity, respectively. Results show that the proposed regularizations achieve
considerable performance improvement in feasibility and diversity of the
synthesized population over traditional models. The proposed VAE additionally
generated 23.5% of the population ignored by the sample with 79.2% precision
(i.e., 20.8% structural zeros rates), while the proposed GAN generated 18.3% of
the ignored population with 89.0% precision. The proposed improvement in DGM
generates a more feasible and diverse synthetic population, which is critical
for the accuracy of an activity-based model.
Related papers
- Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models [4.73459038844245]
This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty.<n>Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7% and precision by 15%.<n>Since synthetic populations serve as a key input for agent-based models (ABM), this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
arXiv Detail & Related papers (2026-02-17T00:02:30Z) - Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions [50.1404916337174]
We present the first large-scale, statistically powered audit of native probabilistic sampling in large language models (LLMs)<n>We show that batch generation achieves only modest statistical validity, with a 13% median pass rate, while independent requests collapse almost entirely.<n>We conclude that current LLMs lack a functional internal sampler, necessitating the use of external tools for applications requiring statistical guarantees.
arXiv Detail & Related papers (2026-01-08T22:33:12Z) - Population Synthesis using Incomplete Information [0.0]
The paper presents a population synthesis model that utilizes the Wasserstein Generative-Adversarial Network (WGAN) for training on incomplete microsamples.<n>By using a mask matrix to represent missing values, the study proposes a WGAN training algorithm that lets the model learn from a training dataset that has some missing information.
arXiv Detail & Related papers (2025-10-01T13:09:14Z) - Generating Feasible and Diverse Synthetic Populations Using Diffusion Models [5.689443449061003]
Population synthesis is a critical task that involves generating synthetic yet realistic representations of populations.<n>Deep generative models can potentially synthesize possible attribute combinations that present in the actual population but do not exist in the sample data.<n>In this study, a novel diffusion model-based population synthesis method is proposed to estimate the underlying joint distribution of a population.
arXiv Detail & Related papers (2025-08-06T03:11:27Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - A Large Language Model for Feasible and Diverse Population Synthesis [0.6581049960856515]
We propose a fine-tuning method for large language models (LLMs) that explicitly controls the autoregressive generation process through topological orderings derived from a Bayesian Network (BN)<n>Our approach achieves approximately 95% feasibility, significantly higher than the 80% observed in deep generative models (DGMs)<n>This makes the approach cost-effective and scalable for large-scale applications, such as synthesizing populations in megacities.
arXiv Detail & Related papers (2025-05-07T07:50:12Z) - U-aggregation: Unsupervised Aggregation of Multiple Learning Algorithms [4.871473117968554]
We propose an unsupervised model aggregation method, U-aggregation, for enhanced and robust performance in new populations.
Unlike existing supervised model aggregation or super learner approaches, U-aggregation assumes no observed labels or outcomes in the target population.
We demonstrate its potential real-world application by using U-aggregation to enhance genetic risk prediction of complex traits.
arXiv Detail & Related papers (2025-01-30T01:42:51Z) - Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach [2.5674049243330255]
This paper introduces a novel method, Sample-efficient Probabilistic Detection using Extreme Value Theory (SPADE)
The approach is based on a Generalized Extreme Value (GEV) model of the training distribution in the classifier's latent space.
The abstaining classifier, which rejects samples based on their assessment, provably avoids adversarial samples.
arXiv Detail & Related papers (2025-01-17T13:51:14Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - Bt-GAN: Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks [3.3903891679981593]
We present Bias-transforming Generative Adversarial Networks (Bt-GAN), a GAN-based synthetic data generator specifically designed for the healthcare domain.
Our results demonstrate that Bt-GAN achieves SOTA accuracy while significantly improving fairness and minimizing bias.
arXiv Detail & Related papers (2024-04-21T12:16:38Z) - Combining propensity score methods with variational autoencoders for
generating synthetic data in presence of latent sub-groups [0.0]
Heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and reflected only in properties of distributions, such as bimodality or skewness.
We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique.
arXiv Detail & Related papers (2023-12-12T22:49:24Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Reweighted Mixup for Subpopulation Shift [63.1315456651771]
Subpopulation shift exists in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.
Importance reweighting is a classical and effective way to handle the subpopulation shift.
We propose a simple yet practical framework, called reweighted mixup, to mitigate the overfitting issue.
arXiv Detail & Related papers (2023-04-09T03:44:50Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z) - Selectively increasing the diversity of GAN-generated samples [8.980453507536017]
We propose a novel method to selectively increase the diversity of GAN-generated samples.
We show the superiority of our method in a synthetic benchmark as well as a real-life scenario simulating data from the Zero Degree Calorimeter of ALICE experiment in CERN.
arXiv Detail & Related papers (2022-07-04T16:27:06Z) - Locally Typical Sampling [84.62530743899025]
We show that today's probabilistic language generators fall short when it comes to producing coherent and fluent text.<n>We propose a simple and efficient procedure for enforcing this criterion when generating from probabilistic models.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Reparameterized Sampling for Generative Adversarial Networks [71.30132908130581]
We propose REP-GAN, a novel sampling method that allows general dependent proposals by REizing the Markov chains into the latent space of the generator.
Empirically, extensive experiments on synthetic and real datasets demonstrate that our REP-GAN largely improves the sample efficiency and obtains better sample quality simultaneously.
arXiv Detail & Related papers (2021-07-01T10:34:55Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling
by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator.
We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.