Related papers: InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

URL: http://arxiv.org/abs/2602.20294v1
Date: Mon, 23 Feb 2026 19:21:10 GMT
Title: InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation
Authors: Yu Li, Pranav Narayanan Venkit, Yada Pruksachatkun, Chien-Sheng Wu,
Abstract summary: Simulating real personalities with large language models requires grounding generation in authentic personal data.<n>We propose an interview-grounded evaluation framework for personality simulation at a large scale.<n>We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities.
Score: 32.09483697866529
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what individuals actually said. We address this gap with an interview-grounded evaluation framework for personality simulation at a large scale. We extract over 671,000 question-answer pairs from 23,000 verified interview transcripts across 1,000 public personalities, each with an average of 11.5 hours of interview content. We propose a multi-dimensional evaluation framework with four complementary metrics measuring content similarity, factual consistency, personality alignment, and factual knowledge retention. Through systematic comparison, we demonstrate that methods grounded in real interview data substantially outperform those relying solely on biographical profiles or the model's parametric knowledge. We further reveal a trade-off in how interview data is best utilized: retrieval-augmented methods excel at capturing personality style and response quality, while chronological-based methods better preserve factual consistency and knowledge retention. Our evaluation framework enables principled method selection based on application requirements, and our empirical findings provide actionable insights for advancing personality simulation research.

Related papers

Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data [54.145424717168794]
Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored.<n>This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years.<n>We propose the "Individual Turing Test" to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer.
arXiv Detail & Related papers (2026-03-01T21:46:27Z)
SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery [55.50580661343875]
We introduce SparkMe, a multi-agent interviewer that performs deliberative planning via simulated conversation rollouts to select questions with high expected utility.<n>We evaluate SparkMe through controlled experiments with LLM-based interviewees, showing that it achieves higher interview utility.<n>We further validate SparkMe in a user study with 70 participants across 7 professions on the impact of AI on their professions.
arXiv Detail & Related papers (2026-02-24T17:33:02Z)
Modular AI-Powered Interviewer with Dynamic Question Generation and Expertise Profiling [0.7349727826230863]
This study presents an AI-powered interviewer that dynamically generates questions that are contextually appropriate and expertise aligned.<n>The interviewer is built on a locally hosted large language model (LLM) that generates coherent dialogue while preserving data privacy.<n>The proposed interviewer is a scalable, privacy-conscious solution that advances AI-assisted qualitative data collection.
arXiv Detail & Related papers (2025-11-21T18:25:26Z)
Using Large Language Models to Develop Requirements Elicitation Skills [1.1473376666000734]
We propose conditioning a large language model to play the role of the client during a chat-based interview.<n>We find that both approaches provide sufficient information for participants to construct technically sound solutions.
arXiv Detail & Related papers (2025-03-10T19:27:38Z)
CritiQ: Mining Data Quality Criteria from Human Preferences [91.44025907584931]
We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality.<n>CritiQ Flow employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments.<n>We demonstrate the effectiveness of our method in the code, math, and logic domains.
arXiv Detail & Related papers (2025-02-26T16:33:41Z)
Personality Structured Interview for Large Language Model Simulation in Personality Research [8.208325358490807]
We explore the potential of the theory-informed Personality Structured Interview as a tool for simulating human responses in personality research.<n>We have provided a growing set of 357 structured interview transcripts from a representative sample, each containing an individual's response to 32 open-ended questions.<n>Results from three experiments demonstrate that well-designed structured interviews could improve human-like heterogeneity in LLM-simulated personality data.
arXiv Detail & Related papers (2025-02-17T18:31:57Z)
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers [40.80290002598963]
This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews.<n>We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers.<n>Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy.
arXiv Detail & Related papers (2024-09-16T16:03:08Z)
QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z)
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation. It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z)
Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv. It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation. The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z)
Leveraging Multimodal Behavioral Analytics for Automated Job Interview Performance Assessment and Feedback [0.5872014229110213]
Behavioral cues play a significant part in human communication and cognitive perception. We propose a multimodal analytical framework that analyzes the candidate in an interview scenario. We use these multimodal data sources to construct a composite representation, which is used for training machine learning classifiers to predict the class labels.
arXiv Detail & Related papers (2020-06-14T14:20:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.