Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions
- URL: http://arxiv.org/abs/2108.11328v5
- Date: Sun, 06 Apr 2025 02:27:46 GMT
- Title: Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions
- Authors: Shibal Ibrahim, Peter Radchenko, Emanuel Ben-David, Rahul Mazumder,
- Abstract summary: We consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models.<n>The study is motivated by the US Census Bureau's well-known ROAM application.
- Score: 12.818275315985971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models. The study is motivated by the US Census Bureau's well-known ROAM application, which uses a linear regression model trained on the US Census Planning Database data to identify hard-to-survey areas. A crowdsourcing competition (Erdman and Bates, 2016) organized more than ten years ago revealed that machine learning methods based on ensembles of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to their black-box nature. We consider nonparametric additive models with a small number of main and pairwise interaction effects using $\ell_0$-based penalization. From a methodological viewpoint, we study our estimator's computational and statistical aspects and discuss variants incorporating strong hierarchical interactions. Our algorithms (open-sourced on GitHub) extend the computational frontiers of existing algorithms for sparse additive models to be able to handle datasets relevant to the application we consider. We discuss and interpret findings from our model on the US Census Planning Database. In addition to being useful from an interpretability standpoint, our models lead to predictions comparable to popular black-box machine learning methods based on gradient boosting and feedforward neural networks - suggesting that it is possible to have models that have the best of both worlds: good model accuracy and interpretability.
Related papers
- Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance.
Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws.
We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - A step towards the integration of machine learning and classic model-based survey methods [0.0]
The usage of machine learning methods in traditional surveys is still very limited.<n>We propose a predictor supported by these algorithms, which can be used to predict any population or subpopulation characteristics.
arXiv Detail & Related papers (2024-02-12T09:43:17Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Black-box Bayesian inference for economic agent-based models [0.0]
We investigate the efficacy of two classes of black-box approximate Bayesian inference methods.
We demonstrate that neural network based black-box methods provide state of the art parameter inference for economic simulation models.
arXiv Detail & Related papers (2022-02-01T18:16:12Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Learning Opinion Dynamics From Social Traces [25.161493874783584]
We propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces.
We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart.
We apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect.
arXiv Detail & Related papers (2020-06-02T14:48:17Z) - Amortized Bayesian Inference for Models of Cognition [0.1529342790344802]
Recent advances in simulation-based inference using specialized neural network architectures circumvent many previous problems of approximate Bayesian computation.
We provide a general introduction to amortized Bayesian parameter estimation and model comparison.
arXiv Detail & Related papers (2020-05-08T08:12:15Z) - Amortized Bayesian model comparison with evidential deep learning [0.12314765641075436]
We propose a novel method for performing Bayesian model comparison using specialized deep learning architectures.
Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset.
We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work.
arXiv Detail & Related papers (2020-04-22T15:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.