Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory
- URL: http://arxiv.org/abs/2408.11415v1
- Date: Wed, 21 Aug 2024 08:20:41 GMT
- Title: Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory
- Authors: Simon Münker,
- Abstract summary: This work investigates the alignment between personalized language models and survey participants on a Moral Foundation questionnaire.
We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations.
Our findings indicate that adapted models struggle to represent the survey-leading assessment of political ideologies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contemporary research in social sciences is increasingly utilizing state-of-the-art statistical language models to annotate or generate content. While these models perform benchmark-leading on common language tasks and show exemplary task-independent emergent abilities, transferring them to novel out-of-domain tasks is only insufficiently explored. The implications of the statistical black-box approach - stochastic parrots - are prominently criticized in the language model research community; however, the significance for novel generative tasks is not. This work investigates the alignment between personalized language models and survey participants on a Moral Foundation Theory questionnaire. We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations. Analyzing the intra-group variance and cross-alignment shows significant differences across models and personas. Our findings indicate that adapted models struggle to represent the survey-captured assessment of political ideologies. Thus, using language models to mimic social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes. Without quantifiable alignment, generating politically nuanced content remains unfeasible. To enhance these representations, we propose a testable framework to generate agents based on moral value statements for future research.
Related papers
- Do language models accommodate their users? A study of linguistic convergence [15.958711524171362]
We find that models strongly converge to the conversation's style, often significantly overfitting relative to the human baseline.<n>We observe consistent shifts in convergence across modeling settings, with instruction-tuned and larger models converging less than their pretrained counterparts.
arXiv Detail & Related papers (2025-08-05T09:55:40Z) - Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models [93.1043186636177]
We explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations.<n>We propose a computational implementation of this idea -- a Model Synthesis Architecture''<n>We evaluate our MSA as a model of human judgments on a novel reasoning dataset.
arXiv Detail & Related papers (2025-07-16T18:01:03Z) - Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings [7.284860523651357]
We assess the misalignment between Large Language Models (LLMs)-simulated and actual human behaviors in multiple-choice survey settings.<n>We apply this framework to a popular language model for simulating people's opinions in various public surveys.<n>This raises questions about the alignment of this language model with the tested populations.
arXiv Detail & Related papers (2025-06-17T22:04:55Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction [5.774786149181393]
We analyze how demographic attributes and prompt variations influence latent opinion mappings in large language models (LLMs)<n>We find that LLM-generated data fails to replicate the variance observed in real-world human responses.<n>In the political space, persona-to-party mappings exhibit limited differentiation, resulting in synthetic data that lacks the nuanced distribution of opinions found in survey data.
arXiv Detail & Related papers (2025-02-22T16:25:33Z) - From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity.
We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT.
The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models.
Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Computational Models to Study Language Processing in the Human Brain: A Survey [47.81066391664416]
This paper reviews efforts in using computational models for brain research, highlighting emerging trends.
Our analysis reveals that no single model outperforms others on all datasets.
arXiv Detail & Related papers (2024-03-20T08:01:22Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Out of One, Many: Using Language Models to Simulate Human Samples [3.278541277919869]
We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is both fine-grained and demographically correlated.
We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants.
arXiv Detail & Related papers (2022-09-14T19:53:32Z) - Schr\"odinger's Tree -- On Syntax and Neural Language Models [10.296219074343785]
Language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities.
We observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form.
We outline the implications of the different types of research questions exhibited in studies on syntax.
arXiv Detail & Related papers (2021-10-17T18:25:23Z) - Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions? [62.74872383104381]
We investigate the effectiveness of natural language interventions for reading-comprehension systems.
We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
arXiv Detail & Related papers (2021-06-02T20:57:58Z) - Evaluating the Interpretability of Generative Models by Interactive
Reconstruction [30.441247705313575]
We introduce a task to quantify the human-interpretability of generative model representations.
We find performance on this task much more reliably differentiates entangled and disentangled models than baseline approaches.
arXiv Detail & Related papers (2021-02-02T02:38:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.