Related papers: Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts

Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts

URL: http://arxiv.org/abs/2509.01814v1
Date: Mon, 01 Sep 2025 22:44:57 GMT
Title: Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
Authors: Shreyas Tirumala, Nishant Jain, Danny D. Leybzon, Trent D. Buskirk,
Abstract summary: Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time.<n>We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions.
Score: 7.938565669618949
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time. This position paper reviews emerging evidence to understand when such AI interviewing systems are fit for purpose for collecting data within quantitative and qualitative research contexts. We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions: input/output performance (i.e., speech recognition, answer recording, emotion handling) and verbal reasoning (i.e., ability to probe, clarify, and handle branching logic). Field studies suggest that AI interviewers already exceed IVR capabilities for both quantitative and qualitative data collection, but real-time transcription error rates, limited emotion detection abilities, and uneven follow-up quality indicate that the utility, use and adoption of current AI interviewer technology may be context-dependent for qualitative data collection efforts.

Related papers

Modular AI-Powered Interviewer with Dynamic Question Generation and Expertise Profiling [0.7349727826230863]
This study presents an AI-powered interviewer that dynamically generates questions that are contextually appropriate and expertise aligned.<n>The interviewer is built on a locally hosted large language model (LLM) that generates coherent dialogue while preserving data privacy.<n>The proposed interviewer is a scalable, privacy-conscious solution that advances AI-assisted qualitative data collection.
arXiv Detail & Related papers (2025-11-21T18:25:26Z)
What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z)
MimiTalk: Revolutionizing Qualitative Research with Dual-Agent AI [1.8219577154655007]
We present MimiTalk, a dual-agent constitutional AI framework designed for scalable and ethical conversational data collection in social science research.<n>We conducted three studies: Study 1 evaluated usability with 20 participants; Study 2 compared 121 AI interviews to 1,271 human interviews from the MediaSum dataset using NLP metrics and propensity score matching; Study 3 involved 10 interdisciplinary researchers conducting both human and AI interviews, followed by blind thematic analysis.<n>Results indicate that MimiTalk reduces interview anxiety, maintains conversational coherence, and outperforms human interviews in information richness, coherence, and stability.
arXiv Detail & Related papers (2025-09-27T16:02:50Z)
AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer [1.8929175690169533]
We built and tested an AI system to conduct quantitative surveys based on large language models (LLM), automatic speech recognition (ASR), and speech synthesis technologies.<n>The system was specifically designed for quantitative research, and strictly adhered to research best practices like question order randomization, answer order randomization, and exact wording.<n>Our results suggest that shorter instruments and more responsive AI interviewers may contribute to improvements across all three metrics studied.
arXiv Detail & Related papers (2025-07-23T17:30:14Z)
Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale [0.0]
This presents an AI-driven telephone survey system integrating text-to-speech (TTS), a large language model (LLM), and speech-to-speech (STT)<n>We tested the system across two populations, a pilot study in the United States (n = 75) and a large-scale deployment in Peru (n = 2,739)<n>Our findings demonstrate that while the AI system's probing for qualitative depth was more limited than human interviewers, overall data quality approached human-led standards for structured items.
arXiv Detail & Related papers (2025-02-27T14:31:42Z)
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers [40.80290002598963]
This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews.<n>We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers.<n>Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy.
arXiv Detail & Related papers (2024-09-16T16:03:08Z)
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension [95.8442896569132]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format. Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z)
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation. Specifically, we formulate the conversation generation problem as a language modeling task. We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z)
FCC: Fusing Conversation History and Candidate Provenance for Contextual Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z)
InterviewBot: Real-Time End-to-End Dialogue System to Interview Students for College Admission [18.630848902825406]
InterviewBot integrates conversation history and customized topics into a coherent embedding space. 7,361 audio recordings of human-to-human interviews are automatically transcribed, where 440 are manually corrected for finetuning and evaluation. InterviewBot is tested both statistically by comparing its responses to the interview data and dynamically by inviting professional interviewers and various students to interact with it in real-time.
arXiv Detail & Related papers (2023-03-27T09:46:56Z)
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows. Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z)
Towards Data Distillation for End-to-end Spoken Conversational Question Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA) SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora. Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.