Related papers: Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study

Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study

URL: http://arxiv.org/abs/2509.25063v1
Date: Mon, 29 Sep 2025 17:12:18 GMT
Title: Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study
Authors: Tobias Holtdirk, Dennis Assenmacher, Arnim Bleier, Claudia Wagner,
Abstract summary: We fine-tune large language models to impute self-reported vote choice under both random and systematic nonresponse.<n>LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot.<n>This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missingness.
Score: 0.6104510780984732
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Survey researchers face two key challenges: the rising costs of probability samples and missing data (e.g., non-response or attrition), which can undermine inference and increase the use of convenience samples. Recent work explores using large language models (LLMs) to simulate respondents via persona-based prompts, often without labeled data. We study a more practical setting where partial survey responses exist: we fine-tune LLMs on available data to impute self-reported vote choice under both random and systematic nonresponse, using the German Longitudinal Election Study. We compare zero-shot prompting and supervised fine-tuning against tabular classifiers (e.g., CatBoost) and test how different convenience samples (e.g., students) used for fine-tuning affect generalization. Our results show that when data are missing completely at random, fine-tuned LLMs match tabular classifiers but outperform zero-shot approaches. When only biased convenience samples are available, fine-tuning small (3B to 8B) open-source LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot and often better than tabular methods. This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missingness, and may enable new survey designs requiring only easily accessible subpopulations.

Related papers

Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling [59.133428586090226]
Large language models (LLMs) can often accurately describe probability distributions using natural language.<n>This mismatch limits their use in tasks requiring reliableity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making.<n>We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions [5.902306366006418]
Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers.<n>Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science.<n>One limitation is their dependency on data to train the model for question selection.<n>We investigate if synthetic data can be used to pre-train the statistical model of an adaptive political survey.
arXiv Detail & Related papers (2025-03-12T12:02:36Z)
Llms, Virtual Users, and Bias: Predicting Any Survey Question Without Human Data [0.0]
We use Large Language Models (LLMs) to create virtual populations that answer survey questions.<n>We evaluate several LLMs-including GPT-4o, GPT-3.5, Claude 3.5-Sonnet, and versions of the Llama and Mistral models-comparing their performance to that of a traditional Random Forests algorithm.
arXiv Detail & Related papers (2025-03-11T16:27:20Z)
Can Large Language Models Simulate Human Responses? A Case Study of Stated Preference Experiments in the Context of Heating-related Choices [2.2582258282563763]
Stated preference (SP) surveys are a key method to research how individuals make trade-offs in hypothetical, also futuristic, scenarios.<n>They tend to be costly, time-consuming, and can be affected by respondent fatigue and ethical constraints.<n>This study investigates the use of large language models (LLMs) to simulate consumer choices in energy-related surveys.
arXiv Detail & Related papers (2025-03-07T10:37:31Z)
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm [50.492124556982674]
This paper introduces a novel choice-based sample selection framework.<n>It shifts the focus from evaluating individual sample quality to comparing the contribution value of different samples.<n>We validate our approach on a larger medical dataset, highlighting its practical applicability in real-world applications.
arXiv Detail & Related papers (2025-03-04T07:32:41Z)
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions [4.020002996724124]
Large language models (LLMs) predict survey responses in advance during the early stages of survey design.<n>We propose directly fine-tuning LLMs to predict response distributions by leveraging unique structural characteristics of survey data.<n>We show that fine-tuning on SubPOP greatly improves the match between LLM predictions and human responses across various subpopulations.
arXiv Detail & Related papers (2025-02-24T00:31:33Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification [6.273933281069326]
generative large language models (LLMs) are increasingly used for data augmentation tasks. We compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. Results indicate, that while some informed'' selection strategies increase the performance of models, it happens only seldom and with marginal performance increases.
arXiv Detail & Related papers (2024-10-14T17:30:08Z)
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.<n>We show that our approach consistently boosts DPO by a considerable margin.<n>Our method not only maximizes the utility of preference data but also mitigates the issue of unlearning, demonstrating its broad effectiveness beyond mere data expansion.
arXiv Detail & Related papers (2024-10-10T16:01:51Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
Learning with Noisy Labels over Imbalanced Subpopulations [13.477553187049462]
Learning with noisy labels (LNL) has attracted significant attention from the research community. We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations. We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
arXiv Detail & Related papers (2022-11-16T07:25:24Z)
One for More: Selecting Generalizable Samples for Generalizable ReID Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function. Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.