Can Strategic Data Collection Improve the Performance of Poverty
Prediction Models?
- URL: http://arxiv.org/abs/2211.08735v1
- Date: Wed, 16 Nov 2022 07:50:56 GMT
- Title: Can Strategic Data Collection Improve the Performance of Poverty
Prediction Models?
- Authors: Satej Soman, Emily Aiken, Esther Rolf, and Joshua Blumenstock
- Abstract summary: We test whether adaptive sampling strategies for ground truth data collection can improve the performance of poverty prediction models.
We find that none of these active learning methods improve over uniform-at-random sampling.
We discuss how these results can help shape future efforts to refine machine learning-based estimates of poverty.
- Score: 4.444335418188173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning-based estimates of poverty and wealth are increasingly being
used to guide the targeting of humanitarian aid and the allocation of social
assistance. However, the ground truth labels used to train these models are
typically borrowed from existing surveys that were designed to produce national
statistics -- not to train machine learning models. Here, we test whether
adaptive sampling strategies for ground truth data collection can improve the
performance of poverty prediction models. Through simulations, we compare the
status quo sampling strategies (uniform at random and stratified random
sampling) to alternatives that prioritize acquiring training data based on
model uncertainty or model performance on sub-populations. Perhaps
surprisingly, we find that none of these active learning methods improve over
uniform-at-random sampling. We discuss how these results can help shape future
efforts to refine machine learning-based estimates of poverty.
Related papers
- Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - A step towards the integration of machine learning and small area
estimation [0.0]
We propose a predictor supported by machine learning algorithms which can be used to predict any population or subpopulation characteristics.
We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well.
What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods.
arXiv Detail & Related papers (2024-02-12T09:43:17Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Improving Heterogeneous Model Reuse by Density Estimation [105.97036205113258]
This paper studies multiparty learning, aiming to learn a model using the private data of different participants.
Model reuse is a promising solution for multiparty learning, assuming that a local model has been trained for each party.
arXiv Detail & Related papers (2023-05-23T09:46:54Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - The Lifecycle of a Statistical Model: Model Failure Detection,
Identification, and Refitting [26.351782287953267]
We develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade.
We present empirical results with three real-world data sets.
We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations.
arXiv Detail & Related papers (2022-02-08T22:02:31Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Modeling Survival in model-based Reinforcement Learning [0.0]
This work presents the notion of survival by discussing cases in which the agent's goal is to survive.
A substitute model for the reward function approxor is introduced that learns to avoid terminal states.
Focusing on terminal states, as a small fraction of state-space, reduces the training effort drastically.
arXiv Detail & Related papers (2020-04-18T15:49:11Z) - Design-unbiased statistical learning in survey sampling [0.0]
We propose a subsampling Rao-Blackwell method, and develop a statistical learning theory for exactly design-unbiased estimation.
Our approach makes use of classic ideas from Statistical Science as well as the rapidly growing field of Machine Learning.
arXiv Detail & Related papers (2020-03-25T14:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.