Related papers: Multicalibrated Partitions for Importance Weights

Multicalibrated Partitions for Importance Weights

URL: http://arxiv.org/abs/2103.05853v1
Date: Wed, 10 Mar 2021 03:32:36 GMT
Title: Multicalibrated Partitions for Importance Weights
Authors: Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder
Abstract summary: importance weights play a fundamental role in many different fields, most notably, statistics and machine learning. We show that the MaxEntropy approach may fail to assign high average scores to sets $C in mathcalC$, even when the average of ground truth weights for the set is evidently large. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds.
Score: 17.1726078570842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where $R$ and $P$ are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about $R$ and $P$ or on the importance weights themselves. In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets $C \in \mathcal{C}$, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets $C \in \mathcal{C}$. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right.

Related papers

From Minimax Optimal Importance Sampling to Uniformly Ergodic Importance-tempered MCMC [4.662958544712181]
We make two closely related theoretical contributions to the use of importance sampling schemes.<n>First, we prove that the minimax optimal trial distribution with the target if and only if the target distribution has no atom with probability greater than $1/2$.<n>Second, we argue that it is often advantageous to run the Metropolis--Hastings algorithm with a tempered stationary distribution.
arXiv Detail & Related papers (2025-06-23T23:05:06Z)
A Distributional-Lifting Theorem for PAC Learning [16.985620991607345]
Distributional assumptions facilitate the design of efficient algorithms but limit their reach and relevance.<n>Recent work of Blanc, Lange, Malik, and Tan considered the special case of lifting uniform-distribution learners.<n>We show that their approach is information-theoretically intractable with access only to random examples.<n>We then take a different approach that sidesteps the need to learn $Dstar$, yielding a lifter that works in the standard PAC model.
arXiv Detail & Related papers (2025-06-19T23:28:38Z)
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond [28.931489333515618]
Given an unnormalized probability density $piproptomathrme-V$, estimating its normalizing constant $Z=int_mathbbRdmathrme-V(x)mathrmdx$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. We propose a new normalizing constant estimation algorithm based on reverse diffusion samplers and establish a framework for analyzing its complexity.
arXiv Detail & Related papers (2025-02-07T00:05:28Z)
Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z)
Kernel Density Estimators in Large Dimensions [9.299356601085586]
We study the kernel-based estimate of the density $hatrho_hmathcal D(x)=frac1n hdsum_i=1n Kleft(fracx-y_ihright)$, depending on the bandwidth $h$. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper.
arXiv Detail & Related papers (2024-08-11T15:56:44Z)
A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$. For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy. We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning. We highlight differences in terms of convergence rates between several unbiased compression operators. We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z)
Value-aware Importance Weighting for Off-policy Reinforcement Learning [11.3798693158017]
Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. In this work, we consider a broader class of importance weights to correct samples in off-policy learning. We derive how such weights can be computed, and detail key properties of the resulting importance weights.
arXiv Detail & Related papers (2023-06-27T17:05:22Z)
Do PAC-Learners Learn the Marginal Distribution? [19.54058590042626]
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $textituniform convergence$ of empirical error in $H$. This work revisits the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting.
arXiv Detail & Related papers (2023-02-13T11:42:58Z)
List-Decodable Sparse Mean Estimation [7.594050968868919]
A surge of recent research interest has been focusing on the list-decodable setting where $alpha in (0, frac12]$, and the goal is to output a finite number of estimates among which at least one approximates the target mean. In this paper, we consider that the underlying target is the mean distribution $k$ and the first contribution is the first sample $Obig(mathrmpoly(k, log dbig)$, i.e. poly-logarithmic in the dimension dimension.
arXiv Detail & Related papers (2022-05-28T05:13:45Z)
Distributed k-Means with Outliers in General Metrics [0.6117371161379208]
We present a distributed coreset-based 3-round approximation algorithm for k-means with $z$ outliers for general metric spaces. An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by the dimension doubling $D$ of the metric space.
arXiv Detail & Related papers (2022-02-16T16:24:31Z)
FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input. When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points. $mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.