ControlLM: Crafting Diverse Personalities for Language Models
- URL: http://arxiv.org/abs/2402.10151v1
- Date: Thu, 15 Feb 2024 17:58:29 GMT
- Title: ControlLM: Crafting Diverse Personalities for Language Models
- Authors: Yixuan Weng, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao
- Abstract summary: We introduce ControlLM, which leverages differential activation patterns, derived from contrasting behavioral prompts in the model's latent space, to influence the model's personality traits at inference.
First, we demonstrate ControlLM's capacity to elicit diverse persona behaviors without any training, while precision control allows personality traits to closely match average human values.
We showcase improved reasoning and question answering through selective amplification of beneficial attributes like conscientiousness and friendliness.
- Score: 32.411304295746746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As language models continue to scale in size and capability, they display an
array of emerging behaviors, both beneficial and concerning. This heightens the
need to control model behaviors. We hope to be able to control the personality
traits of language models at the inference-time so as to have various character
features, on top of which the requirements of different types of tasks can be
met. Personality is a higher-level and more abstract behavioral representation
for language models. We introduce ControlLM, which leverages differential
activation patterns, derived from contrasting behavioral prompts in the model's
latent space, to influence the model's personality traits at inference. This
approach allows for the precise, real-time adjustment of model behavior. First,
we demonstrate ControlLM's capacity to elicit diverse persona behaviors without
any training, while precision control allows personality traits to closely
match average human values. Subsequently, we showcase improved reasoning and
question answering through selective amplification of beneficial attributes
like conscientiousness and friendliness. We hope that this work will inspire
research on controlling human-like behaviors of language models and provide
insights for future research. Our code is publicly available at:
https://github.com/wengsyx/ControlLM.
Related papers
- Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior [2.4095382017500464]
One way to personalize and steer generations from large language models (LLM) is to assign a persona.
This paper investigates how personas affect diverse aspects of model behavior.
arXiv Detail & Related papers (2024-07-02T09:36:54Z) - LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced
Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts.
Most existing methods learn post features directly by fine-tuning the pre-trained language models.
We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z) - Roles of Scaling and Instruction Tuning in Language Perception: Model
vs. Human Attention [58.817405319722596]
This work compares the self-attention of several large language models (LLMs) in different sizes to assess the effect of scaling and instruction tuning on language perception.
Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not.
We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models.
arXiv Detail & Related papers (2023-10-29T17:16:40Z) - Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs)
We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Estimating the Personality of White-Box Language Models [0.589889361990138]
Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere.
Existing research shows that these models can and do capture human biases.
Many of these biases, especially those that could potentially cause harm, are being well-investigated.
However, studies that infer and change human personality traits inherited by these models have been scarce or non-existent.
arXiv Detail & Related papers (2022-04-25T23:53:53Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.