Concentration bounds on response-based vector embeddings of black-box generative models
- URL: http://arxiv.org/abs/2511.08307v1
- Date: Wed, 12 Nov 2025 01:52:05 GMT
- Title: Concentration bounds on response-based vector embeddings of black-box generative models
- Authors: Aranyak Acharyya, Joshua Agterberg, Youngser Park, Carey E. Priebe,
- Abstract summary: Generative models, such as large language models or text-to-image diffusion models, can generate relevant responses to user-given queries.<n>Data Kernel Perspective Space embedding is one method of obtaining response-based vector embeddings for a given set of generative models.<n>Our results tell us the required number of sample responses needed in order to approximate the population-level vector embeddings with a desired level of accuracy.
- Score: 11.24508357413322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models, such as large language models or text-to-image diffusion models, can generate relevant responses to user-given queries. Response-based vector embeddings of generative models facilitate statistical analysis and inference on a given collection of black-box generative models. The Data Kernel Perspective Space embedding is one particular method of obtaining response-based vector embeddings for a given set of generative models, already discussed in the literature. In this paper, under appropriate regularity conditions, we establish high probability concentration bounds on the sample vector embeddings for a given set of generative models, obtained through the method of Data Kernel Perspective Space embedding. Our results tell us the required number of sample responses needed in order to approximate the population-level vector embeddings with a desired level of accuracy. The algebraic tools used to establish our results can be used further for establishing concentration bounds on Classical Multidimensional Scaling embeddings in general, when the dissimilarities are observed with noise.
Related papers
- DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks [79.50756148780928]
This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks.<n>We extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model.
arXiv Detail & Related papers (2025-04-24T05:13:27Z) - Generative diffusion model with inverse renormalization group flows [0.0]
Diffusion models produce data by denoising a sample corrupted by white noise.<n>We introduce a renormalization group-based diffusion model that leverages multiscale nature of data distributions.<n>We validate the versatility of the model through applications to protein structure prediction and image generation.
arXiv Detail & Related papers (2025-01-15T19:00:01Z) - Accelerated Diffusion Models via Speculative Sampling [89.43940130493233]
Speculative sampling is a popular technique for accelerating inference in Large Language Models.<n>We extend speculative sampling to diffusion models, which generate samples via continuous, vector-valued Markov chains.<n>We propose various drafting strategies, including a simple and effective approach that does not require training a draft model.
arXiv Detail & Related papers (2025-01-09T16:50:16Z) - Reconstructing Galaxy Cluster Mass Maps using Score-based Generative Modeling [9.386611764730791]
We present a novel approach to reconstruct gas and dark matter projected density maps of galaxy clusters using score-based generative modeling.<n>Our diffusion model takes in mock SZ and X-ray images as conditional inputs, and generates realizations of corresponding gas and dark matter maps by sampling from a learned data posterior.
arXiv Detail & Related papers (2024-10-03T18:00:03Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - An Energy-Based Prior for Generative Saliency [62.79775297611203]
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.
With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction.
Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps consistent with human perception.
arXiv Detail & Related papers (2022-04-19T10:51:00Z) - Generalization Metrics for Practical Quantum Advantage in Generative
Models [68.8204255655161]
Generative modeling is a widely accepted natural use case for quantum computers.
We construct a simple and unambiguous approach to probe practical quantum advantage for generative modeling by measuring the algorithm's generalization performance.
Our simulation results show that our quantum-inspired models have up to a $68 times$ enhancement in generating unseen unique and valid samples.
arXiv Detail & Related papers (2022-01-21T16:35:35Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Gaussian-Dirichlet Random Fields for Inference over High Dimensional
Categorical Observations [3.383942690870476]
We propose a generative model for the distribution of high dimensional categorical observations produced by robots.
The proposed approach combines the use of Dirichlet distributions to sparse co-occurrence relations between the observed categories.
arXiv Detail & Related papers (2020-03-26T19:29:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.