Scaling Group Inference for Diverse and High-Quality Generation
- URL: http://arxiv.org/abs/2508.15773v1
- Date: Thu, 21 Aug 2025 17:59:57 GMT
- Title: Scaling Group Inference for Diverse and High-Quality Generation
- Authors: Gaurav Parmar, Or Patashnik, Daniil Ostashev, Kuan-Chieh Wang, Kfir Aberman, Srinivasa Narasimhan, Jun-Yan Zhu,
- Abstract summary: We introduce a scalable group inference method that improves the diversity and quality of a group of samples.<n>Our framework generalizes across a wide range of tasks, including text-to-image, image-to-image, image prompting, and video generation.
- Score: 43.33751261265585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models typically sample outputs independently, and recent inference-time guidance and scaling algorithms focus on improving the quality of individual samples. However, in real-world applications, users are often presented with a set of multiple images (e.g., 4-8) for each prompt, where independent sampling tends to lead to redundant results, limiting user choices and hindering idea exploration. In this work, we introduce a scalable group inference method that improves both the diversity and quality of a group of samples. We formulate group inference as a quadratic integer assignment problem: candidate outputs are modeled as graph nodes, and a subset is selected to optimize sample quality (unary term) while maximizing group diversity (binary term). To substantially improve runtime efficiency, we progressively prune the candidate set using intermediate predictions, allowing our method to scale up to large candidate sets. Extensive experiments show that our method significantly improves group diversity and quality compared to independent sampling baselines and recent inference algorithms. Our framework generalizes across a wide range of tasks, including text-to-image, image-to-image, image prompting, and video generation, enabling generative models to treat multiple outputs as cohesive groups rather than independent samples.
Related papers
- Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization [68.64764778089229]
We propose MISP-DPO, the first framework to incorporate multiple, semantically diverse negative images in multimodal DPO.<n>Our method embeds prompts and candidate images in CLIP space and applies a sparse autoencoder to uncover semantic deviations into interpretable factors.<n>Experiments across five benchmarks demonstrate that MISP-DPO consistently improves multimodal alignment over prior methods.
arXiv Detail & Related papers (2025-09-30T03:24:09Z) - Towards Compute-Optimal Many-Shot In-Context Learning [63.815463719071055]
We propose two strategies for demonstration selection in many-shot ICL.<n>First method combines a small number of demonstrations, selected based on similarity to each test sample, with a disproportionately larger set of random demonstrations that are cached.<n>Second strategy improves the first by replacing random demonstrations with those selected using centroids derived from test sample representations via k-means clustering.
arXiv Detail & Related papers (2025-07-22T04:21:03Z) - Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm [50.492124556982674]
This paper introduces a novel choice-based sample selection framework.<n>It shifts the focus from evaluating individual sample quality to comparing the contribution value of different samples.<n>We validate our approach on a larger medical dataset, highlighting its practical applicability in real-world applications.
arXiv Detail & Related papers (2025-03-04T07:32:41Z) - Hit the Sweet Spot! Span-Level Ensemble for Large Language Models [8.34562564266839]
We propose SweetSpan, a span-level ensemble method that effectively balances the need for real-time adjustments and the information required for accurate ensemble decisions.
Our approach involves two key steps: First, we have each candidate model independently generate candidate spans based on the shared prefix.
Second, we calculate perplexity scores to facilitate mutual evaluation among the candidate models and achieve robust span selection by filtering out unfaithful scores.
arXiv Detail & Related papers (2024-09-27T09:41:29Z) - Compress Guidance in Conditional Diffusion Sampling [16.671575782090045]
This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue.
We observe a significant improvement in image quality and diversity while also reducing the required guidance timesteps by nearly 40%.
arXiv Detail & Related papers (2024-08-20T21:02:54Z) - Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI [29.13807697733638]
We build on the remarkable achievements in generative sampling of natural images.
We propose an innovative challenge, potentially overly ambitious, which involves generating samples that resemble images.
The statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects.
arXiv Detail & Related papers (2024-04-10T22:35:06Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function.<n>Group sampling is proposed, which gathers samples from the same class into groups.<n>Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - Informative Sample Mining Network for Multi-Domain Image-to-Image
Translation [101.01649070998532]
We show that improving the sample selection strategy is an effective solution for image-to-image translation tasks.
We propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness.
arXiv Detail & Related papers (2020-01-05T05:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.