Transforming Social Science Research with Transfer Learning: Social Science Survey Data Integration with AI
- URL: http://arxiv.org/abs/2501.06577v1
- Date: Sat, 11 Jan 2025 16:01:44 GMT
- Title: Transforming Social Science Research with Transfer Learning: Social Science Survey Data Integration with AI
- Authors: Ali Amini,
- Abstract summary: Large-N nationally representative surveys, which have profoundly shaped American politics scholarship, represent related but distinct domains.<n>Our study introduces a novel application of transfer learning (TL) to address these gaps.<n>Models pre-trained on the Cooperative Election Study dataset are fine-tuned for use in the American National Election Studies dataset.
- Score: 0.4944564023471818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-N nationally representative surveys, which have profoundly shaped American politics scholarship, represent related but distinct domains -a key condition for transfer learning applications. These surveys are related through their shared demographic, party identification, and ideological variables, yet differ in that individual surveys often lack specific policy preference questions that researchers require. Our study introduces a novel application of transfer learning (TL) to address these gaps, marking the first systematic use of TL paradigms in the context of survey data. Specifically, models pre-trained on the Cooperative Election Study (CES) dataset are fine-tuned for use in the American National Election Studies (ANES) dataset to predict policy questions based on demographic variables. Even with a naive architecture, our transfer learning approach achieves approximately 92 percentage accuracy in predicting missing variables across surveys, demonstrating the robust potential of this method. Beyond this specific application, our paper argues that transfer learning is a promising framework for maximizing the utility of existing survey data. We contend that artificial intelligence, particularly transfer learning, opens new frontiers in social science methodology by enabling systematic knowledge transfer between well-administered surveys that share common variables but differ in their outcomes of interest.
Related papers
- A Survey of AIOps in the Era of Large Language Models [60.59720351854515]
We analyzed 183 research papers published between January 2020 and December 2024 to answer four key research questions (RQs)<n>We discuss the state-of-the-art advancements and trends, identify gaps in existing research, and propose promising directions for future exploration.
arXiv Detail & Related papers (2025-06-23T02:40:16Z) - Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z) - Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions [5.902306366006418]
Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers.
Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science.
One limitation is their dependency on data to train the model for question selection.
We investigate if synthetic data can be used to pre-train the statistical model of an adaptive political survey.
arXiv Detail & Related papers (2025-03-12T12:02:36Z) - Llms, Virtual Users, and Bias: Predicting Any Survey Question Without Human Data [0.0]
We use Large Language Models (LLMs) to create virtual populations that answer survey questions.<n>We evaluate several LLMs-including GPT-4o, GPT-3.5, Claude 3.5-Sonnet, and versions of the Llama and Mistral models-comparing their performance to that of a traditional Random Forests algorithm.
arXiv Detail & Related papers (2025-03-11T16:27:20Z) - Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.
As a testbed, we use country-level results from two global cultural surveys.
We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? [1.7819574476785418]
This study explores the potential of Large Language Models (LLMs) to generate artificial surveys.<n>By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods.<n>A novel approach incorporating "Personas" is introduced and compared to five other synthetic survey methods.
arXiv Detail & Related papers (2025-01-20T15:11:03Z) - Large Language Models for Market Research: A Data-augmentation Approach [3.3199591445531453]
Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks.<n>Recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two.<n>We propose a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis.
arXiv Detail & Related papers (2024-12-26T22:06:29Z) - Agentic Society: Merging skeleton from real world and texture from Large Language Model [4.740886789811429]
This paper explores a novel framework that leverages census data and large language models to generate virtual populations.
We show that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments.
But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs.
arXiv Detail & Related papers (2024-09-02T08:28:19Z) - United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections [42.72938925647165]
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans.<n>"Synthetic samples" might exhibit bias due to training data and fine-tuning processes being unrepresentative of diverse contexts.<n>This study investigates if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction.
arXiv Detail & Related papers (2024-08-29T16:01:06Z) - Transfer Learning for Spatial Autoregressive Models with Application to U.S. Presidential Election Prediction [10.825562180226424]
We propose a novel transfer learning framework within the SAR model, called as tranSAR.
Our framework enhances estimation and prediction by leveraging information from similar source data.
We demonstrate our method's effectiveness in predicting outcomes in U.S. presidential swing states, where it outperforms traditional methods.
arXiv Detail & Related papers (2024-05-20T03:14:15Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - Spurious Correlations in Machine Learning: A Survey [27.949532561102206]
Machine learning systems are sensitive to spurious correlations between non-essential features of the inputs and labels.
These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions.
We provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models.
arXiv Detail & Related papers (2024-02-20T04:49:34Z) - ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs [65.9625653425636]
Large Language models (LLMs) exhibit harmful social biases.
This work introduces a novel approach utilizing ChatGPT to generate synthetic training data.
arXiv Detail & Related papers (2024-02-19T01:28:48Z) - Crowdsourced Adaptive Surveys [0.0]
This paper introduces a crowdsourced adaptive survey methodology (CSAS)<n>The method converts open-ended text provided by participants into survey items and applies a multi-armed bandit algorithm to determine which questions should be prioritized in the survey.<n>I conclude by highlighting CSAS's potential to bridge conceptual gaps between researchers and participants in survey research.
arXiv Detail & Related papers (2024-01-16T04:05:25Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Federated Learning for Generalization, Robustness, Fairness: A Survey
and Benchmark [55.898771405172155]
Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties.
We provide a systematic overview of the important and recent developments of research on federated learning.
arXiv Detail & Related papers (2023-11-12T06:32:30Z) - A Recent Survey of Heterogeneous Transfer Learning [15.830786437956144]
heterogeneous transfer learning has become a vital strategy in various tasks.
We offer an extensive review of over 60 HTL methods, covering both data-based and model-based approaches.
We explore applications in natural language processing, computer vision, multimodal learning, and biomedicine.
arXiv Detail & Related papers (2023-10-12T16:19:58Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Predicting Survey Response with Quotation-based Modeling: A Case Study
on Favorability towards the United States [0.0]
We propose a pioneering approach for predicting survey responses by examining quotations using machine learning.
We leverage a vast corpus of quotations from individuals across different nationalities to extract their level of favorability.
We employ a combination of natural language processing techniques and machine learning algorithms to construct a predictive model for survey responses.
arXiv Detail & Related papers (2023-05-23T14:11:01Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - Domain Generalization: A Survey [146.68420112164577]
Domain generalization (DG) aims to achieve OOD generalization by only using source domain data for model learning.
For the first time, a comprehensive literature review is provided to summarize the ten-year development in DG.
arXiv Detail & Related papers (2021-03-03T16:12:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.