Optimal Representative Sample Weighting
- URL: http://arxiv.org/abs/2005.09065v1
- Date: Mon, 18 May 2020 20:29:00 GMT
- Title: Optimal Representative Sample Weighting
- Authors: Shane Barratt, Guillermo Angeris, Stephen Boyd
- Abstract summary: We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting.
We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved.
We describe rsw, an open-source implementation of the ideas described in this paper, and apply it to a skewed sample of the CDC BRFSS dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of assigning weights to a set of samples or data
records, with the goal of achieving a representative weighting, which happens
when certain sample averages of the data are close to prescribed values. We
frame the problem of finding representative sample weights as an optimization
problem, which in many cases is convex and can be efficiently solved. Our
formulation includes as a special case the selection of a fixed number of the
samples, with equal weights, i.e., the problem of selecting a smaller
representative subset of the samples. While this problem is combinatorial and
not convex, heuristic methods based on convex optimization seem to perform very
well. We describe rsw, an open-source implementation of the ideas described in
this paper, and apply it to a skewed sample of the CDC BRFSS dataset.
Related papers
- Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Leveraging Importance Weights in Subset Selection [45.54597544672441]
We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting.
Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches.
arXiv Detail & Related papers (2023-01-28T02:07:31Z) - Optimal Efficiency-Envy Trade-Off via Optimal Transport [33.85971515753188]
We consider the problem of allocating a distribution of items to $n$ recipients where each recipient has to be allocated a fixed, prespecified fraction of all items.
We show that this problem can be formulated as a variant of the semi-discrete optimal transport (OT) problem, whose solution structure in this case has a concise representation and a simple geometric interpretation.
arXiv Detail & Related papers (2022-09-25T00:39:43Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Wasserstein Distributionally Robust Optimization via Wasserstein
Barycenters [10.103413548140848]
We seek data-driven decisions which perform well under the most adverse distribution from a nominal distribution constructed from data samples within a certain distance of probability distributions.
We propose constructing the nominal distribution in Wasserstein distributionally robust optimization problems through the notion of Wasserstein barycenter as an aggregation of data samples from multiple sources.
arXiv Detail & Related papers (2022-03-23T02:03:47Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Finding Influential Instances for Distantly Supervised Relation
Extraction [42.94953922808431]
This work proposes a novel model-agnostic instance sampling method for Distant supervision (DS) by influence function (IF)
Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling.
Experiments show that REIF is able to win over a series of baselines that have complicated architectures.
arXiv Detail & Related papers (2020-09-17T02:02:07Z) - Approximating a Target Distribution using Weight Queries [25.392248158616862]
We propose an interactive algorithm that iteratively selects data set examples and performs corresponding weight queries.
We derive an approximation bound on the total variation distance between the reweighting found by the algorithm and the best achievable reweighting.
arXiv Detail & Related papers (2020-06-24T11:17:43Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.