Importance Sampling via Score-based Generative Models
- URL: http://arxiv.org/abs/2502.04646v1
- Date: Fri, 07 Feb 2025 04:09:03 GMT
- Title: Importance Sampling via Score-based Generative Models
- Authors: Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo de Veciana,
- Abstract summary: Importance sampling involves sampling from a probability density function proportional to the product of an importance weight function and a base PDF.
We propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF.
We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks.
- Score: 12.32722207200796
- License:
- Abstract: Importance sampling, which involves sampling from a probability density function (PDF) proportional to the product of an importance weight function and a base PDF, is a powerful technique with applications in variance reduction, biased or customized sampling, data augmentation, and beyond. Inspired by the growing availability of score-based generative models (SGMs), we propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF. Our key innovation is realizing the importance sampling process as a backward diffusion process, expressed in terms of the score function of the base PDF and the specified importance weight function--both readily available--eliminating the need for any additional training. We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks, including importance sampling for industrial and natural images with neural importance weight functions. The training-free aspect of our method is particularly compelling in real-world scenarios where a single base distribution underlies multiple biased sampling tasks, each requiring a different importance weight function. To the best of our knowledge our approach is the first importance sampling framework to achieve this.
Related papers
- Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.
We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.
Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - Propensity-driven Uncertainty Learning for Sample Exploration in Source-Free Active Domain Adaptation [19.620523416385346]
Source-free active domain adaptation (SFADA) addresses the challenge of adapting a pre-trained model to new domains without access to source data.
This scenario is particularly relevant in real-world applications where data privacy, storage limitations, or labeling costs are significant concerns.
We propose the Propensity-driven Uncertainty Learning (ProULearn) framework to effectively select more informative samples without frequently requesting human annotations.
arXiv Detail & Related papers (2025-01-23T10:05:25Z) - A Short Survey on Importance Weighting for Machine Learning [3.27651593877935]
It is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio.
This survey summarizes the broad applications of importance weighting in machine learning and related research.
arXiv Detail & Related papers (2024-03-15T10:31:46Z) - Self-Influence Guided Data Reweighting for Language Model Pre-training [46.57714637505164]
Language Models (LMs) pre-trained with self-supervision on large text corpora have become the default starting point for developing models for various NLP tasks.
All data samples in the corpus are treated with equal importance during LM pre-training.
Due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice.
We propose PRESENCE, a method for jointly reweighting samples by leveraging self-influence (SI) scores as an indicator of sample importance and pre-training.
arXiv Detail & Related papers (2023-11-02T01:00:46Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - MUC-driven Feature Importance Measurement and Adversarial Analysis for
Random Forest [1.5896078006029473]
We leverage formal methods and logical reasoning to develop a novel model-specific method for explaining the prediction of Random Forest (RF)
Our approach is centered around Minimal Unsatisfiable Cores (MUC) and provides a comprehensive solution for feature importance, covering local and global aspects, and adversarial sample analysis.
Our method can produce a user-centered report, which helps provide recommendations in real-life applications.
arXiv Detail & Related papers (2022-02-25T06:15:47Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.