Related papers: Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs

Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs

URL: http://arxiv.org/abs/2602.15436v1
Date: Tue, 17 Feb 2026 08:59:13 GMT
Title: Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs
Authors: Joonatan Laato, Veera Schroderus, Jenna Kanerva, Jenni Kauppi, Virpi Lummaa, Filip Ginter,
Abstract summary: We develop a categorization framework that captures key aspects of participation.<n>Using a simple voting approach across multiple model runs, we find that an open-weight LLM can closely match expert judgments.
Score: 2.373317705249957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Digitized historical archives make it possible to study everyday social life on a large scale, but the information extracted directly from text often does not directly allow one to answer the research questions posed by historians or sociologists in a quantitative manner. We address this problem in a large collection of Finnish World War II Karelian evacuee family interviews. Prior work extracted more than 350K mentions of leisure time activities and organizational memberships from these interviews, yielding 71K unique activity and organization names -- far too many to analyze directly. We develop a categorization framework that captures key aspects of participation (the kind of activity/organization, how social it typically is, how regularly it happens, and how physically demanding it is). We annotate a gold-standard set to allow for a reliable evaluation, and then test whether large language models can apply the same schema at scale. Using a simple voting approach across multiple model runs, we find that an open-weight LLM can closely match expert judgments. Finally, we apply the method to label the 350K entities, producing a structured resource for downstream studies of social integration and related outcomes.

Related papers

Grok in the Wild: Characterizing the Roles and Uses of Large Language Models on Social Media [5.844783557050257]
xAI's large language model, Grok, is called by millions of people each week on the social media platform X.<n>At the platform level, we find that Grok responds to 62% of requests, that the majority (51%) are in English, and that engagement is low.<n>We also inductively build a taxonomy of 10 roles that LLMs play in mediating social interactions and use these roles to analyze 41,735 interactions with Grok on X.
arXiv Detail & Related papers (2026-02-11T19:06:22Z)
PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [0.3518016233072557]
We introduce PBBQ, a benchmark dataset designed to evaluate social biases in Persian language models.<n>The PBBQ dataset contains over 37,000 carefully curated questions.<n>Our findings reveal that current LLMs exhibit significant social biases across Persian culture.
arXiv Detail & Related papers (2025-10-22T14:12:00Z)
Measuring Scalar Constructs in Social Science with LLMs [48.92998035333579]
We evaluate approaches to measuring scalar constructs in large language models.<n>We find that pairwise comparisons produce better measurements than simply prompting the LLM to directly output the scores.<n>Finetuning smaller models with as few as 1,000 training pairs can match or exceed the performance of prompted LLMs.
arXiv Detail & Related papers (2025-09-03T08:19:13Z)
Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish? [0.0]
Traditional lexicon-based tools such as LIWC and Pattern have long served as foundational instruments in this domain.<n>We first conducted a study involving approximately 25,000 textual responses from 102 Dutch-speaking participants.<n>We assessed the performance of three Dutch-specific LLMs in predicting these valence scores, and compared their outputs to those generated by LIWC and Pattern.<n>This study underscores the imperative for developing culturally and linguistically tailored models/tools that can adeptly handle the complexities of natural language use.
arXiv Detail & Related papers (2025-06-04T16:31:37Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks [6.487500253901779]
Large language models (LLMs) play roles in multi-user, multi-turn social agent tasks.<n>We propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities.<n>HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and task completion capabilities of LLMs.
arXiv Detail & Related papers (2025-04-04T08:59:01Z)
Large Language Models: A Survey [66.39828929831017]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.<n>LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
L-Eval: Instituting Standardized Evaluation for Long Context Language Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs) We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs. Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z)
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs [65.9077733300329]
Large Language Models (LLMs) have the potential to greatly enhance the analysis of public affairs documents. LLMs can be of great use to process domain-specific documents, such as those in the domain of public affairs.
arXiv Detail & Related papers (2023-06-05T13:35:01Z)
Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.