Model-free Subsampling Method Based on Uniform Designs
- URL: http://arxiv.org/abs/2209.03617v1
- Date: Thu, 8 Sep 2022 07:47:56 GMT
- Title: Model-free Subsampling Method Based on Uniform Designs
- Authors: Mei Zhang, Yongdao Zhou, Zheng Zhou, Aijun Zhang
- Abstract summary: We develop a low-GEFD data-driven subsampling method based on the existing uniform designs.
Our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing.
- Score: 5.661822729320697
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Subsampling or subdata selection is a useful approach in large-scale
statistical learning. Most existing studies focus on model-based subsampling
methods which significantly depend on the model assumption. In this paper, we
consider the model-free subsampling strategy for generating subdata from the
original full data. In order to measure the goodness of representation of a
subdata with respect to the original data, we propose a criterion, generalized
empirical F-discrepancy (GEFD), and study its theoretical properties in
connection with the classical generalized L2-discrepancy in the theory of
uniform designs. These properties allow us to develop a kind of low-GEFD
data-driven subsampling method based on the existing uniform designs. By
simulation examples and a real case study, we show that the proposed
subsampling method is superior to the random sampling method. Moreover, our
method keeps robust under diverse model specifications while other popular
subsampling methods are under-performing. In practice, such a model-free
property is more appealing than the model-based subsampling methods, where the
latter may have poor performance when the model is misspecified, as
demonstrated in our simulation studies.
Related papers
- Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - Learning Robust Statistics for Simulation-based Inference under Model
Misspecification [23.331522354991527]
We propose the first general approach to handle model misspecification that works across different classes of simulation-based inference methods.
We show that our method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
arXiv Detail & Related papers (2023-05-25T09:06:26Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Comparing Foundation Models using Data Kernels [13.099029073152257]
We present a methodology for directly comparing the embedding space geometry of foundation models.
Our methodology is grounded in random graph theory and enables valid hypothesis testing of embedding similarity.
We show how our framework can induce a manifold of models equipped with a distance function that correlates strongly with several downstream metrics.
arXiv Detail & Related papers (2023-05-09T02:01:07Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - An optimal transport approach for selecting a representative subsample
with application in efficient kernel density estimation [21.632131776088084]
Subsampling methods aim to select a subsample as a surrogate for the observed sample.
Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks.
We propose a novel model-free subsampling method by utilizing optimal transport techniques.
arXiv Detail & Related papers (2022-05-31T05:19:29Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity.
We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost.
Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z) - Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures.
Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset.
We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.