Representative Language Generation
- URL: http://arxiv.org/abs/2505.21819v1
- Date: Tue, 27 May 2025 23:02:54 GMT
- Title: Representative Language Generation
- Authors: Charlotte Peale, Vinod Raman, Omer Reingold,
- Abstract summary: "representative generation" is extended to address diversity and bias concerns in generative models.<n>We demonstrate feasibility for countably infinite hypothesis classes and collections of groups under certain conditions.<n>Our findings provide a rigorous foundation for developing more diverse and representative generative models.
- Score: 4.601683217376771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce "representative generation," extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our notion requires outputs of a generative model to proportionally represent groups of interest from the training data. We characterize representative uniform and non-uniform generation, introducing the "group closure dimension" as a key combinatorial quantity. For representative generation in the limit, we analyze both information-theoretic and computational aspects, demonstrating feasibility for countably infinite hypothesis classes and collections of groups under certain conditions, but proving a negative result for computability using only membership queries. This contrasts with Kleinberg et al.'s (2024) positive results for standard generation in the limit. Our findings provide a rigorous foundation for developing more diverse and representative generative models.
Related papers
- Language Generation in the Limit: Noise, Loss, and Feedback [10.280148603465697]
We show that a finite union of uniformly generatable collections is generatable in the limit, and asked if the same is true for non-uniform generation.<n>We show the equivalence of these models for uniform and non-uniform generation, and provide a characterization of non-uniform noisy generation.
arXiv Detail & Related papers (2025-07-21T07:18:04Z) - On Union-Closedness of Language Generation [48.36356615217017]
We investigate language generation in the limit - a model by Kleinberg and Mullainathan and extended by Li, Raman, and Tewari.<n>Our results resolve two open questions of Li et al. by proving finite unions of generatable or non-uniformly generatable classes need not be generatable.<n>Our approach utilizes carefully constructed classes along with a novel diagonalization argument that could be of independent interest in the growing area of language generation.
arXiv Detail & Related papers (2025-06-23T13:42:25Z) - Generation through the lens of learning theory [18.355039522639565]
We study generation through the lens of statistical learning theory.<n>We call "uniform" and "non-uniform" generation, and provide a characterization of which hypothesis classes are uniformly and non-uniformly generatable.
arXiv Detail & Related papers (2024-10-17T16:14:49Z) - A Non-negative VAE:the Generalized Gamma Belief Network [49.970917207211556]
The gamma belief network (GBN) has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data.
We introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model.
We also propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables.
arXiv Detail & Related papers (2024-08-06T18:18:37Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.<n>Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z) - To Pool or Not To Pool: Analyzing the Regularizing Effects of Group-Fair
Training on Shared Models [14.143499246740278]
We derive group-specific bounds on the generalization error of welfare-centric fair machine learning.
We do this by considering group-specific Rademacher averages over a restricted hypothesis class.
Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.
arXiv Detail & Related papers (2024-02-29T02:16:57Z) - Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome
Homogenization? [90.35044668396591]
A recurring theme in machine learning is algorithmic monoculture: the same systems, or systems that share components, are deployed by multiple decision-makers.
We propose the component-sharing hypothesis: if decision-makers share components like training data or specific models, then they will produce more homogeneous outcomes.
We test this hypothesis on algorithmic fairness benchmarks, demonstrating that sharing training data reliably exacerbates homogenization.
We conclude with philosophical analyses of and societal challenges for outcome homogenization, with an eye towards implications for deployed machine learning systems.
arXiv Detail & Related papers (2022-11-25T09:33:11Z) - Unifying Causal Inference and Reinforcement Learning using Higher-Order
Category Theory [4.119151469153588]
We present a unified formalism for structure discovery of causal models and predictive state representation models in reinforcement learning.
Specifically, we model structure discovery in both settings using simplicial objects.
arXiv Detail & Related papers (2022-09-13T19:04:18Z) - RepFair-GAN: Mitigating Representation Bias in GANs Using Gradient
Clipping [2.580765958706854]
We define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes.
We show that this fairness notion is violated even when the dataset contains equally represented groups.
We show that controlling the groups' gradient norm by performing group-wise gradient norm clipping in the discriminator leads to a more fair data generation.
arXiv Detail & Related papers (2022-07-13T14:58:48Z) - GroupifyVAE: from Group-based Definition to VAE-based Unsupervised
Representation Disentanglement [91.9003001845855]
VAE-based unsupervised disentanglement can not be achieved without introducing other inductive bias.
We address VAE-based unsupervised disentanglement by leveraging the constraints derived from the Group Theory based definition as the non-probabilistic inductive bias.
We train 1800 models covering the most prominent VAE-based models on five datasets to verify the effectiveness of our method.
arXiv Detail & Related papers (2021-02-20T09:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.