Keeping Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions
- URL: http://arxiv.org/abs/2311.12707v2
- Date: Sat, 6 Jul 2024 22:38:36 GMT
- Title: Keeping Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions
- Authors: Hye Sun Yun, Mehdi Arjmand, Phillip Sherlock, Michael K. Paasche-Orlow, James W. Griffith, Timothy Bickmore,
- Abstract summary: Repeated use of virtual agent-administered questionnaires can induce respondent fatigue.
We propose using large language models (LLMs) to generate diverse questionnaire versions.
- Score: 2.657841934326794
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Standardized, validated questionnaires are vital tools in research and healthcare, offering dependable self-report data. Prior work has revealed that virtual agent-administered questionnaires are almost equivalent to self-administered ones in an electronic form. Despite being an engaging method, repeated use of virtual agent-administered questionnaires in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose using large language models (LLMs) to generate diverse questionnaire versions while retaining good psychometric properties. In a longitudinal study, participants interacted with our agent system and responded daily for two weeks to one of the following questionnaires: a standardized depression questionnaire, question variants generated by LLMs, or question variants accompanied by LLM-generated small talk. The responses were compared to a validated depression questionnaire. Psychometric testing revealed consistent covariation between the external criterion and focal measure administered across the three conditions, demonstrating the reliability and validity of the LLM-generated variants. Participants found that the variants were significantly less repetitive than repeated administrations of the same standardized questionnaire. Our findings highlight the potential of LLM-generated variants to invigorate agent-administered questionnaires and foster engagement and interest, without compromising their validity.
Related papers
- AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
We employ a Large Language Model (LLM) to convert unstructured psychological interviews into structured questionnaires spanning various psychiatric and personality domains.
The obtained answers are coded as features, which are used to predict standardized psychiatric measures of depression (PHQ-8) and PTSD (PCL-C)
arXiv Detail & Related papers (2024-06-09T09:03:11Z) - Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions.
We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z) - You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner.
Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability.
Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z) - Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs
Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs)
Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures.
We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z) - When Respondents Don't Care Anymore: Identifying the Onset of Careless Responding [0.0]
We propose a novel method for identifying the onset of careless responding for each participant.
It is based on combined measurements of multiple dimensions in which carelessness may manifest.
It is highly flexible, based on machine learning, and provides statistical guarantees on its performance.
arXiv Detail & Related papers (2023-03-13T15:10:30Z) - Open vs Closed-ended questions in attitudinal surveys -- comparing,
combining, and interpreting using natural language processing [3.867363075280544]
Topic Modeling could significantly reduce the time to extract information from open-ended responses.
Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses.
arXiv Detail & Related papers (2022-05-03T06:01:03Z) - MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for
Answer Selection [59.95429407899612]
We propose a novel reinforcement learning based multi-step ranking model, named MS-Ranker.
We explicitly consider the potential correctness of candidates and update the evidence with a gating mechanism.
Our model significantly outperforms existing methods that do not rely on external resources.
arXiv Detail & Related papers (2020-10-10T10:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.