Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
- URL: http://arxiv.org/abs/2412.11250v1
- Date: Sun, 15 Dec 2024 17:16:08 GMT
- Title: Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
- Authors: Sayantan Pal, Souvik Das, Rohini K. Srihari,
- Abstract summary: We introduce a novel dataset with around 400,000 dialogues and a framework for generating personalized conversations using long-form journal entries from Reddit.
Our approach clusters journal entries for each author and filters them by selecting the most representative cluster, ensuring that the retained entries best reflect the author's personality.
We further refine the data by capturing the Big Five personality traits --openness, conscientiousness, extraversion, agreeableness, and neuroticism.
Using Llama 3 70B, we generate high-quality, personality-rich dialogues grounded in these journal entries.
- Score: 6.404122934568859
- License:
- Abstract: Large Language Models (LLMs) have significantly improved personalized conversational capabilities. However, existing datasets like Persona Chat, Synthetic Persona Chat, and Blended Skill Talk rely on static, predefined personas. This approach often results in dialogues that fail to capture human personalities' fluid and evolving nature. To overcome these limitations, we introduce a novel dataset with around 400,000 dialogues and a framework for generating personalized conversations using long-form journal entries from Reddit. Our approach clusters journal entries for each author and filters them by selecting the most representative cluster, ensuring that the retained entries best reflect the author's personality. We further refine the data by capturing the Big Five personality traits --openness, conscientiousness, extraversion, agreeableness, and neuroticism --ensuring that dialogues authentically reflect an individual's personality. Using Llama 3 70B, we generate high-quality, personality-rich dialogues grounded in these journal entries. Fine-tuning models on this dataset leads to an 11% improvement in capturing personality traits on average, outperforming existing approaches in generating more coherent and personality-driven dialogues.
Related papers
- Dialogue Language Model with Large-Scale Persona Data Engineering [10.160626284195434]
PPDS is an open-domain persona dialogue system that employs extensive generative pre-training on a persona dialogue dataset to enhance persona consistency.
We present a persona extraction model designed to autonomously and precisely generate vast persona dialogue datasets.
We also unveil a pioneering persona augmentation technique to address the invalid persona bias inherent in the constructed dataset.
arXiv Detail & Related papers (2024-12-12T07:49:06Z) - Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Dynamic Generation of Personalities with Large Language Models [20.07145733116127]
We introduce Dynamic Personality Generation (DPG), a dynamic personality generation method based on Hypernetworks.
We embed the Big Five personality theory into GPT-4 to form a personality assessment machine.
We then use this personality assessment machine to evaluate dialogues in script data, resulting in a personality-dialogue dataset.
arXiv Detail & Related papers (2024-04-10T15:17:17Z) - PersonalityChat: Conversation Distillation for Personalized Dialog
Modeling with Facts and Traits [5.447308344436046]
PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset.
We show that the personality trait labels can be used for trait-based personalization of generative dialogue models.
arXiv Detail & Related papers (2024-01-14T20:35:33Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - MPCHAT: Towards Multimodal Persona-Grounded Conversation [54.800425322314105]
We extend persona-based dialogue to the multimodal domain and make two main contributions.
First, we present the first multimodal persona-based dialogue dataset named MPCHAT.
Second, we empirically show that incorporating multimodal persona, as measured by three proposed multimodal persona-grounded dialogue tasks, leads to statistically significant performance improvements.
arXiv Detail & Related papers (2023-05-27T06:46:42Z) - Enhancing Personalized Dialogue Generation with Contrastive Latent
Variables: Combining Sparse and Dense Persona [16.90863217077699]
Existing personalized dialogue agents model persona profiles from three resources: sparse or dense persona descriptions and dialogue histories.
We combine the advantages of the three resources to obtain a richer and more accurate persona.
Experimental results on Chinese and English datasets demonstrate our model's superiority in personalization.
arXiv Detail & Related papers (2023-05-19T07:24:27Z) - Less is More: Learning to Refine Dialogue History for Personalized
Dialogue Generation [57.73547958927826]
We propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more accurate persona information.
Specifically, we design an MSP model which consists of three personal information refiners and a personalized response generator.
arXiv Detail & Related papers (2022-04-18T02:02:56Z) - DLVGen: A Dual Latent Variable Approach to Personalized Dialogue
Generation [28.721411816698563]
We propose a Dual Latent Variable Generator (DLVGen) capable of generating personalized dialogue.
Unlike prior work, DLVGen models the latent distribution over potential responses as well as the latent distribution over the agent's potential persona.
Empirical results show that DLVGen is capable of generating diverse responses which accurately incorporate the agent's persona.
arXiv Detail & Related papers (2021-11-22T17:21:21Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - Will I Sound Like Me? Improving Persona Consistency in Dialogues through
Pragmatic Self-Consciousness [62.55060760615656]
Recent models tackling consistency often train with additional Natural Language Inference (NLI) labels or attach trained extra modules to the generative agent for maintaining consistency.
Inspired by social cognition and pragmatics, we endow existing dialogue agents with public self-consciousness on the fly through an imaginary listener.
Our approach, based on the Rational Speech Acts framework, can enforce dialogue agents to refrain from uttering contradiction.
arXiv Detail & Related papers (2020-04-13T08:16:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.