Do LLMs Understand Why We Write Diaries? A Method for Purpose Extraction and Clustering
- URL: http://arxiv.org/abs/2506.00985v1
- Date: Sun, 01 Jun 2025 12:38:01 GMT
- Title: Do LLMs Understand Why We Write Diaries? A Method for Purpose Extraction and Clustering
- Authors: Valeriya Goloviznina, Alexander Sergeev, Mikhail Melnichenko, Evgeny Kotelnikov,
- Abstract summary: This study introduces a novel method based on Large Language Models (LLMs) to identify and cluster the various purposes of diary writing.<n>Our approach is applied to Soviet-era diaries (1922-1929) from the Prozhito digital archive.
- Score: 41.94295877935867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diary analysis presents challenges, particularly in extracting meaningful information from large corpora, where traditional methods often fail to deliver satisfactory results. This study introduces a novel method based on Large Language Models (LLMs) to identify and cluster the various purposes of diary writing. By "purposes," we refer to the intentions behind diary writing, such as documenting life events, self-reflection, or practicing language skills. Our approach is applied to Soviet-era diaries (1922-1929) from the Prozhito digital archive, a rich collection of personal narratives. We evaluate different proprietary and open-source LLMs, finding that GPT-4o and o1-mini achieve the best performance, while a template-based baseline is significantly less effective. Additionally, we analyze the retrieved purposes based on gender, age of the authors, and the year of writing. Furthermore, we examine the types of errors made by the models, providing a deeper understanding of their limitations and potential areas for improvement in future research.
Related papers
- Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback [57.200668979963694]
We present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues.<n>We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics.
arXiv Detail & Related papers (2025-07-21T18:56:50Z) - Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes [9.471374217162843]
We propose Retell, a simple, accessible topic modeling approach for literature.<n>We prompt resource-efficient, generative language models (LMs) to tell what passages show.
arXiv Detail & Related papers (2025-05-29T06:59:21Z) - A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Personalization of Large Language Models: A Survey [131.00650432814268]
Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications.<n>Most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems.<n>We introduce a taxonomy for personalized LLM usage and summarizing the key differences and challenges.
arXiv Detail & Related papers (2024-10-29T04:01:11Z) - Undesirable Memorization in Large Language Models: A Survey [5.659933808910005]
memorization refers to a model's tendency to store and reproduce phrases from its training data.<n>This paper provides a taxonomy of the literature on LLM memorization, exploring it across three dimensions: granularity, retrievability, and desirability.<n>We conclude our survey by identifying potential research topics for the near future, including methods to balance privacy and performance.
arXiv Detail & Related papers (2024-10-03T16:34:46Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Digital Forgetting in Large Language Models: A Survey of Unlearning Methods [3.6070136675401656]
This survey focuses on forgetting in large language models (LLMs).
We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline.
Second, we describe the motivations, types, and desired properties of digital forgetting.
Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art.
arXiv Detail & Related papers (2024-04-02T16:01:18Z) - Are Large Language Models Reliable Judges? A Study on the Factuality
Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities.
This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z) - Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs.
We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal.
Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.