Assessing Generalization for Subpopulation Representative Modeling via
In-Context Learning
- URL: http://arxiv.org/abs/2402.07368v1
- Date: Mon, 12 Feb 2024 01:55:51 GMT
- Title: Assessing Generalization for Subpopulation Representative Modeling via
In-Context Learning
- Authors: Gabriel Simmons and Vladislav Savinov
- Abstract summary: This study evaluates the ability of Large Language Model (LLM)-based Subpopulation Representative Models (SRMs) to generalize from empirical data.
We explore generalization across response variables and demographic subgroups.
- Score: 5.439020425819001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study evaluates the ability of Large Language Model (LLM)-based
Subpopulation Representative Models (SRMs) to generalize from empirical data,
utilizing in-context learning with data from the 2016 and 2020 American
National Election Studies. We explore generalization across response variables
and demographic subgroups. While conditioning with empirical data improves
performance on the whole, the benefit of in-context learning varies
considerably across demographics, sometimes hurting performance for one
demographic while helping performance for others. The inequitable benefits of
in-context learning for SRM present a challenge for practitioners implementing
SRMs, and for decision-makers who might come to rely on them. Our work
highlights a need for fine-grained benchmarks captured from diverse
subpopulations that test not only fidelity but generalization.
Related papers
- Demographic Attributes Prediction from Speech Using WavLM Embeddings [25.00298717665857]
This paper introduces a general classifier based on WavLM features, to infer demographic characteristics, such as age, gender, native language, education, and country, from speech.
The proposed framework identifies key acoustic and linguistic fea-tures associated with demographic attributes, achieving a Mean Absolute Error (MAE) of 4.94 for age prediction and over 99.81% accuracy for gender classification.
arXiv Detail & Related papers (2025-02-17T16:43:47Z) - Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.
As a testbed, we use country-level results from two global cultural surveys.
We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts [5.111540255111445]
Race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%.
Retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem from general brittleness issues.
arXiv Detail & Related papers (2025-01-08T07:28:10Z) - A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts.
Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts.
We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - All Should Be Equal in the Eyes of Language Models: Counterfactually
Aware Fair Text Generation [16.016546693767403]
We propose a framework that dynamically compares the model understanding of diverse demographics to generate more equitable sentences.
CAFIE produces fairer text and strikes the best balance between fairness and language modeling capability.
arXiv Detail & Related papers (2023-11-09T15:39:40Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Reassessing Evaluation Practices in Visual Question Answering: A Case
Study on Out-of-Distribution Generalization [27.437077941786768]
Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks.
We evaluate two pretrained V&L models under different settings by conducting cross-dataset evaluations.
We find that these models tend to learn to solve the benchmark, rather than learning the high-level skills required by the VQA task.
arXiv Detail & Related papers (2022-05-24T16:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.