On the limits of algorithmic prediction across the globe
- URL: http://arxiv.org/abs/2103.15212v1
- Date: Sun, 28 Mar 2021 19:53:18 GMT
- Title: On the limits of algorithmic prediction across the globe
- Authors: Xingyu Li, Difan Song, Miaozhe Han, Yu Zhang, Rene F. Kizilcec
- Abstract summary: We show that state-of-the-art machine learning models trained on data from the United States can predict achievement with high accuracy and generalize to other developed countries with comparable accuracy.
Training the same model on national data yields high accuracy in every country, which highlights the value of local data collection.
- Score: 4.392517231156947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The impact of predictive algorithms on people's lives and livelihoods has
been noted in medicine, criminal justice, finance, hiring and admissions. Most
of these algorithms are developed using data and human capital from highly
developed nations. We tested how well predictive models of human behavior
trained in a developed country generalize to people in less developed countries
by modeling global variation in 200 predictors of academic achievement on
nationally representative student data for 65 countries. Here we show that
state-of-the-art machine learning models trained on data from the United States
can predict achievement with high accuracy and generalize to other developed
countries with comparable accuracy. However, accuracy drops linearly with
national development due to global variation in the importance of different
achievement predictors, providing a useful heuristic for policymakers. Training
the same model on national data yields high accuracy in every country, which
highlights the value of local data collection.
Related papers
- Evaluating Pre-Training Bias on Severe Acute Respiratory Syndrome Dataset [0.0]
This work uses the severe acute respiratory syndrome dataset from OpenDataSUS to visualize three pre-training bias metrics.
The aim is to compare the bias for the different regions, focusing on their protected attributes and comparing the model's performance with the metric values.
arXiv Detail & Related papers (2024-08-27T20:49:11Z) - A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models [1.055551340663609]
A new metric has been developed to evaluate algorithmic fairness in predictive student models.
In this paper, we develop a post-processing method that aims at improving the fairness while preserving the accuracy of relevant predictive models' results.
We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data.
arXiv Detail & Related papers (2024-07-07T14:53:41Z) - Mesh-Wise Prediction of Demographic Composition from Satellite Images
Using Multi-Head Convolutional Neural Network [0.0]
This paper proposes a multi-head Convolutional Neural Network model with transfer learning from pre-trained ResNet50 for estimating mesh-wise demographics of Japan.
Satellite images from Landsat-8/OLI and Suomi NPP/VIIRS-DNS as inputs and census demographics as labels.
The trained model was performed on a testing dataset with a test score of at least 0.8914 in $textR2$ for all the demographic composition groups, and the estimated demographic composition was generated and visualised for 2022 as a non-census year.
arXiv Detail & Related papers (2023-08-25T15:41:05Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z) - Generalization and Personalization of Mobile Sensing-Based Mood
Inference Models: An Analysis of College Students in Eight Countries [8.218081835111912]
We collect a mobile sensing dataset with 329K self-reports from 678 participants in eight countries.
We evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data) and multi-country (trained and tested with multiple countries) approaches.
arXiv Detail & Related papers (2022-11-06T02:26:52Z) - Strict baselines for Covid-19 forecasting and ML perspective for USA and
Russia [105.54048699217668]
Covid-19 allows researchers to gather datasets accumulated over 2 years and to use them in predictive analysis.
We present the results of a consistent comparative study of different types of methods for predicting the dynamics of the spread of Covid-19 based on regional data for two countries: the United States and Russia.
arXiv Detail & Related papers (2022-07-15T18:21:36Z) - Forecasting Future World Events with Neural Networks [68.43460909545063]
Autocast is a dataset containing thousands of forecasting questions and an accompanying news corpus.
The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts.
We test language models on our forecasting task and find that performance is far below a human expert baseline.
arXiv Detail & Related papers (2022-06-30T17:59:14Z) - Understanding peacefulness through the world news [1.6975704972827304]
We exploit information extracted from Global Data on Events, Location, and Tone (GDELT) digital news database to capture peacefulness through the Global Peace Index (GPI)
Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level.
arXiv Detail & Related papers (2021-06-01T08:24:57Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.