Related papers: Optimal Representative Sample Weighting

Optimal Representative Sample Weighting

URL: http://arxiv.org/abs/2005.09065v1
Date: Mon, 18 May 2020 20:29:00 GMT
Title: Optimal Representative Sample Weighting
Authors: Shane Barratt, Guillermo Angeris, Stephen Boyd
Abstract summary: We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.

Related papers

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling, using the simple technique of repeatedly sampling candidate solutions from a model. Across multiple tasks and models, we observe that coverage scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where answers can be automatically verified, these increases in coverage directly translate into improved performance.
arXiv Detail & Related papers (2024-07-31T17:57:25Z)
Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages. First we filter the dataset to obtain informative in-context examples individually. Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z)
Leveraging Importance Weights in Subset Selection [45.54597544672441]
We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches.
arXiv Detail & Related papers (2023-01-28T02:07:31Z)
Optimal Efficiency-Envy Trade-Off via Optimal Transport [33.85971515753188]
We consider the problem of allocating a distribution of items to $n$ recipients where each recipient has to be allocated a fixed, prespecified fraction of all items. We show that this problem can be formulated as a variant of the semi-discrete optimal transport (OT) problem, whose solution structure in this case has a concise representation and a simple geometric interpretation.
arXiv Detail & Related papers (2022-09-25T00:39:43Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
Wasserstein Distributionally Robust Optimization via Wasserstein Barycenters [10.103413548140848]
We seek data-driven decisions which perform well under the most adverse distribution from a nominal distribution constructed from data samples within a certain distance of probability distributions. We propose constructing the nominal distribution in Wasserstein distributionally robust optimization problems through the notion of Wasserstein barycenter as an aggregation of data samples from multiple sources.
arXiv Detail & Related papers (2022-03-23T02:03:47Z)
One for More: Selecting Generalizable Samples for Generalizable ReID Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function. Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
Finding Influential Instances for Distantly Supervised Relation Extraction [42.94953922808431]
This work proposes a novel model-agnostic instance sampling method for Distant supervision (DS) by influence function (IF) Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. Experiments show that REIF is able to win over a series of baselines that have complicated architectures.
arXiv Detail & Related papers (2020-09-17T02:02:07Z)
Approximating a Target Distribution using Weight Queries [25.392248158616862]
We propose an interactive algorithm that iteratively selects data set examples and performs corresponding weight queries. We derive an approximation bound on the total variation distance between the reweighting found by the algorithm and the best achievable reweighting.
arXiv Detail & Related papers (2020-06-24T11:17:43Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework. We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.