Personality Alignment of Large Language Models
- URL: http://arxiv.org/abs/2408.11779v1
- Date: Wed, 21 Aug 2024 17:09:00 GMT
- Title: Personality Alignment of Large Language Models
- Authors: Minjun Zhu, Linyi Yang, Yue Zhang,
- Abstract summary: Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors.
We introduce the concept of Personality Alignment.
This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups.
- Score: 26.071445846818914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors, but they often fail to capture the unique characteristics and preferences of individual users. To address this gap, we introduce the concept of Personality Alignment. This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups. Inspired by psychometrics, we created the Personality Alignment with Personality Inventories (PAPI) dataset, which includes data from 300,000 real subjects, each providing behavioral preferences based on the Big Five Personality Factors. This dataset allows us to quantitatively evaluate the extent to which LLMs can align with each subject's behavioral patterns. Recognizing the challenges of personality alignments: such as limited personal data, diverse preferences, and scalability requirements: we developed an activation intervention optimization method. This method enhances LLMs' ability to efficiently align with individual behavioral preferences using minimal data and computational resources. Remarkably, our method, PAS, achieves superior performance while requiring only 1/5 of the optimization time compared to DPO, offering practical value for personality alignment. Our work paves the way for future AI systems to make decisions and reason in truly personality ways, enhancing the relevance and meaning of AI interactions for each user and advancing human-centered artificial intelligence.The code has released in \url{https://github.com/zhu-minjun/PAlign}.
Related papers
- BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data [28.900987544062257]
We introduce BIG5-CHAT, a large-scale dataset containing 100,000 dialogues designed to ground models in how humans express their personality in text.
Our methods prompting outperform on personality assessments such as BFI and IPIP-NEO, with trait correlations more closely matching human data.
Our experiments reveal that models trained to exhibit higher conscientiousness, higher agreeableness, lower extraversion, and lower neuroticism display better performance on reasoning tasks.
arXiv Detail & Related papers (2024-10-21T20:32:27Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits.
We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''
We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.
For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - PersonalLLM: Tailoring LLMs to Individual Preferences [11.717169516971856]
We present a public benchmark, PersonalLLM, focusing on adapting LLMs to provide maximal benefits for a particular user.
We curate open-ended prompts paired with many high-quality answers over which users would be expected to display heterogeneous latent preferences.
Our dataset and generated personalities offer an innovative testbed for developing personalization algorithms.
arXiv Detail & Related papers (2024-09-30T13:55:42Z) - Machine Mindset: An MBTI Exploration of Large Language Models [28.2342069623478]
We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs)
Our method, "Machine Mindset," involves a two-phase fine-tuning and Direct Preference Optimization (DPO) to embed MBTI traits into LLMs.
arXiv Detail & Related papers (2023-12-20T12:59:31Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Personalized Soups: Personalized Large Language Model Alignment via
Post-hoc Parameter Merging [148.77027765872006]
We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem.
LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.
We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
arXiv Detail & Related papers (2023-10-17T20:22:13Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.