Predicting Census Survey Response Rates With Parsimonious Additive
Models and Structured Interactions
- URL: http://arxiv.org/abs/2108.11328v4
- Date: Thu, 7 Dec 2023 19:05:08 GMT
- Title: Predicting Census Survey Response Rates With Parsimonious Additive
Models and Structured Interactions
- Authors: Shibal Ibrahim, Peter Radchenko, Emanuel Ben-David, Rahul Mazumder
- Abstract summary: We consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models.
The study is motivated by the US Census Bureau's well-known ROAM application.
- Score: 14.003044924094597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we consider the problem of predicting survey response rates
using a family of flexible and interpretable nonparametric models. The study is
motivated by the US Census Bureau's well-known ROAM application which uses a
linear regression model trained on the US Census Planning Database data to
identify hard-to-survey areas. A crowdsourcing competition (Erdman and Bates,
2016) organized around ten years ago revealed that machine learning methods
based on ensembles of regression trees led to the best performance in
predicting survey response rates; however, the corresponding models could not
be adopted for the intended application due to their black-box nature. We
consider nonparametric additive models with small number of main and pairwise
interaction effects using $\ell_0$-based penalization. From a methodological
viewpoint, we study both computational and statistical aspects of our
estimator; and discuss variants that incorporate strong hierarchical
interactions. Our algorithms (opensourced on github) extend the computational
frontiers of existing algorithms for sparse additive models, to be able to
handle datasets relevant for the application we consider. We discuss and
interpret findings from our model on the US Census Planning Database. In
addition to being useful from an interpretability standpoint, our models lead
to predictions that appear to be better than popular black-box machine learning
methods based on gradient boosting and feedforward neural networks - suggesting
that it is possible to have models that have the best of both worlds: good
model accuracy and interpretability.
Related papers
- Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Black-box Bayesian inference for economic agent-based models [0.0]
We investigate the efficacy of two classes of black-box approximate Bayesian inference methods.
We demonstrate that neural network based black-box methods provide state of the art parameter inference for economic simulation models.
arXiv Detail & Related papers (2022-02-01T18:16:12Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Learning Opinion Dynamics From Social Traces [25.161493874783584]
We propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces.
We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart.
We apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect.
arXiv Detail & Related papers (2020-06-02T14:48:17Z) - Amortized Bayesian Inference for Models of Cognition [0.1529342790344802]
Recent advances in simulation-based inference using specialized neural network architectures circumvent many previous problems of approximate Bayesian computation.
We provide a general introduction to amortized Bayesian parameter estimation and model comparison.
arXiv Detail & Related papers (2020-05-08T08:12:15Z) - Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures.
Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset.
We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.