Combining the Strengths of Dutch Survey and Register Data in a Data Challenge to Predict Fertility (PreFer)
- URL: http://arxiv.org/abs/2402.00705v2
- Date: Fri, 22 Mar 2024 15:13:17 GMT
- Title: Combining the Strengths of Dutch Survey and Register Data in a Data Challenge to Predict Fertility (PreFer)
- Authors: Elizaveta Sivak, Paulina Pankowska, Adrienne Mendrik, Tom Emery, Javier Garcia-Bernardo, Seyit Hocuk, Kasia Karpinska, Angelica Maineri, Joris Mulder, Malvina Nissim, Gert Stulp,
- Abstract summary: We present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands.
One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics.
The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents.
- Score: 8.4153358785173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The social sciences have produced an impressive body of research on determinants of fertility outcomes, or whether and when people have children. However, the strength of these determinants and underlying theories are rarely evaluated on their predictive ability on new data. This prevents us from systematically comparing studies, hindering the evaluation and accumulation of knowledge. In this paper, we present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands. One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics, including individual preferences and values. The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents. We provide information about the datasets and the samples, and describe the fertility outcome of interest. We also introduce the fertility prediction data challenge PreFer which is based on these datasets and will start in Spring 2024. We outline the ways in which measuring the predictability of fertility outcomes using these datasets and combining their strengths in the data challenge can advance our understanding of fertility behaviour and computational social science. We further provide details for participants on how to take part in the data challenge.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Dataset Bias in Human Activity Recognition [57.91018542715725]
This contribution statistically curates the training data to assess to what degree the physical characteristics of humans influence HAR performance.
We evaluate the performance of a state-of-the-art convolutional neural network on two HAR datasets that vary in the sensors, activities, and recording for time-series HAR.
arXiv Detail & Related papers (2023-01-19T12:33:50Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z) - Extracting candidate factors affecting long-term trends of student
abilities across subjects [0.0]
Long-term student achievement data provide useful information to formulate the research question of what types of student skills would impact future trends across subjects.
We propose a novel approach to extract candidate factors affecting long-term trends across subjects from long-term data.
arXiv Detail & Related papers (2021-03-11T04:13:58Z) - Machine Learning and Data Science approach towards trend and predictors
analysis of CDC Mortality Data for the USA [0.0]
The study concluded (based on a sample) life expectancy regardless of gender, and their central tendencies; Marital status of the people also affected how frequent deaths were for each of them.
The study shows that machine learning predictions aren't as viable for the data as it might be apparent.
arXiv Detail & Related papers (2020-09-11T12:46:57Z) - Bringing the People Back In: Contesting Benchmark Machine Learning
Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created.
We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z) - Jointly Predicting Job Performance, Personality, Cognitive Ability,
Affect, and Well-Being [42.67003631848889]
We create a benchmark for predictive analysis of individuals from a perspective that integrates physical and physiological behavior, psychological states and traits, and job performance.
We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests.
arXiv Detail & Related papers (2020-06-10T14:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.