AI-Assisted Conversational Interviewing: Effects on Data Quality and User Experience
- URL: http://arxiv.org/abs/2504.13908v2
- Date: Fri, 08 Aug 2025 17:43:37 GMT
- Title: AI-Assisted Conversational Interviewing: Effects on Data Quality and User Experience
- Authors: Soubhik Barari, Jarret Angbazo, Natalie Wang, Leah M. Christian, Elizabeth Dean, Zoe Slowinski, Brandon Sepulvado,
- Abstract summary: This study introduces a framework for AI-assisted conversational interviewing.<n>We conducted a web survey experiment where 1,800 participants were randomly assigned to AI 'chatbots'<n>Our findings reveal the feasibility of using AI methods such as chatbots to enhance open-ended data collection in web surveys.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standardized surveys scale efficiently but sacrifice depth, while conversational interviews improve response quality at the cost of scalability and consistency. This study bridges the gap between these methods by introducing a framework for AI-assisted conversational interviewing. To evaluate this framework, we conducted a web survey experiment where 1,800 participants were randomly assigned to AI 'chatbots' which use large language models (LLMs) to dynamically probe respondents for elaboration and interactively code open-ended responses to fixed questions developed by human researchers. We assessed the AI chatbot's performance in terms of coding accuracy, response quality, and respondent experience. Our findings reveal that AI chatbots perform moderately well in live coding even without survey-specific fine-tuning, despite slightly inflated false positive errors due to respondent acquiescence bias. Open-ended responses were more detailed and informative, but this came at a slight cost to respondent experience. Our findings highlight the feasibility of using AI methods such as chatbots enhanced by LLMs to enhance open-ended data collection in web surveys.
Related papers
- Modular AI-Powered Interviewer with Dynamic Question Generation and Expertise Profiling [0.7349727826230863]
This study presents an AI-powered interviewer that dynamically generates questions that are contextually appropriate and expertise aligned.<n>The interviewer is built on a locally hosted large language model (LLM) that generates coherent dialogue while preserving data privacy.<n>The proposed interviewer is a scalable, privacy-conscious solution that advances AI-assisted qualitative data collection.
arXiv Detail & Related papers (2025-11-21T18:25:26Z) - A Knowledge Graph and a Tripartite Evaluation Framework Make Retrieval-Augmented Generation Scalable and Transparent [0.0]
This study presents a Retrieval Augmented Generation (RAG) that harnesses a knowledge graph and vector search retrieval to deliver context-rich responses.<n>A central innovation of this work is the introduction of RAG Evaluation (RAG-Eval), a novel chain-of-thought tripartite evaluation framework.<n>RAG-Eval reliably detects factual gaps and query mismatches, thereby fostering trust in high demand, data centric environments.
arXiv Detail & Related papers (2025-09-23T16:29:22Z) - Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts [7.938565669618949]
Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time.<n>We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions.
arXiv Detail & Related papers (2025-09-01T22:44:57Z) - AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer [1.8929175690169533]
We built and tested an AI system to conduct quantitative surveys based on large language models (LLM), automatic speech recognition (ASR), and speech synthesis technologies.<n>The system was specifically designed for quantitative research, and strictly adhered to research best practices like question order randomization, answer order randomization, and exact wording.<n>Our results suggest that shorter instruments and more responsive AI interviewers may contribute to improvements across all three metrics studied.
arXiv Detail & Related papers (2025-07-23T17:30:14Z) - TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries.<n>We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers [40.80290002598963]
This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews.<n>We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers.<n>Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy.
arXiv Detail & Related papers (2024-09-16T16:03:08Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Collaboration with Conversational AI Assistants for UX Evaluation:
Questions and How to Ask them (Voice vs. Text) [18.884080068561843]
We conducted a Wizard-of-Oz design probe study with 20 participants who interacted with simulated AI assistants via text or voice.
We found that participants asked for five categories of information: user actions, user mental model, help from the AI assistant, product and task information, and user demographics.
The text assistant was perceived as significantly more efficient, but both were rated equally in satisfaction and trust.
arXiv Detail & Related papers (2023-03-07T03:59:14Z) - Connecting Humanities and Social Sciences: Applying Language and Speech
Technology to Online Panel Surveys [2.0646127669654835]
We explore the application of language and speech technology to open-ended questions in a Dutch panel survey.
In an experimental wave respondents could choose to answer open questions via speech or keyboard.
We report the errors the ASR system produces and investigate the impact of these errors on downstream analyses.
arXiv Detail & Related papers (2023-02-21T10:52:15Z) - Conversational QA Dataset Generation with Answer Revision [2.5838973036257458]
We introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations.
Our framework revises the extracted answers after generating questions so that answers exactly match paired questions.
arXiv Detail & Related papers (2022-09-23T04:05:38Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents.
We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z) - Partner Matters! An Empirical Study on Fusing Personas for Personalized
Response Selection in Retrieval-Based Chatbots [51.091235903442715]
This paper makes an attempt to explore the impact of utilizing personas that describe either self or partner speakers on the task of response selection.
Four persona fusion strategies are designed, which assume personas interact with contexts or responses in different ways.
Empirical studies on the Persona-Chat dataset show that the partner personas can improve the accuracy of response selection.
arXiv Detail & Related papers (2021-05-19T10:32:30Z) - Predicting respondent difficulty in web surveys: A machine-learning
approach based on mouse movement features [3.6944296923226316]
This paper explores the predictive value of mouse-tracking data with regard to respondents' difficulty.
We use data from a survey on respondents' employment history and demographic information.
We develop a personalization method that adjusts for respondents' baseline mouse behavior and evaluate its performance.
arXiv Detail & Related papers (2020-11-05T10:54:33Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.