Exploring Qualitative Research Using LLMs
- URL: http://arxiv.org/abs/2306.13298v1
- Date: Fri, 23 Jun 2023 05:21:36 GMT
- Title: Exploring Qualitative Research Using LLMs
- Authors: Muneera Bano, Didar Zowghi, Jon Whittle
- Abstract summary: This study aimed to compare and contrast the comprehension capabilities of humans and AI driven large language models.
We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst.
LLMs were then asked to classify these reviews and provide the reasoning behind each classification.
- Score: 8.545798128849091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of AI driven large language models (LLMs) have stirred discussions
about their role in qualitative research. Some view these as tools to enrich
human understanding, while others perceive them as threats to the core values
of the discipline. This study aimed to compare and contrast the comprehension
capabilities of humans and LLMs. We conducted an experiment with small sample
of Alexa app reviews, initially classified by a human analyst. LLMs were then
asked to classify these reviews and provide the reasoning behind each
classification. We compared the results with human classification and
reasoning. The research indicated a significant alignment between human and
ChatGPT 3.5 classifications in one third of cases, and a slightly lower
alignment with GPT4 in over a quarter of cases. The two AI models showed a
higher alignment, observed in more than half of the instances. However, a
consensus across all three methods was seen only in about one fifth of the
classifications. In the comparison of human and LLMs reasoning, it appears that
human analysts lean heavily on their individual experiences. As expected, LLMs,
on the other hand, base their reasoning on the specific word choices found in
app reviews and the functional components of the app itself. Our results
highlight the potential for effective human LLM collaboration, suggesting a
synergistic rather than competitive relationship. Researchers must continuously
evaluate LLMs role in their work, thereby fostering a future where AI and
humans jointly enrich qualitative research.
Related papers
- Are Large Language Models Good Essay Graders? [4.134395287621344]
We evaluate Large Language Models (LLMs) in assessing essay quality, focusing on their alignment with human grading.
We compare the numeric grade provided by the LLMs to human rater-provided scores utilizing the ASAP dataset.
ChatGPT tends to be harsher and further misaligned with human evaluations than Llama.
arXiv Detail & Related papers (2024-09-19T23:20:49Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks.
These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences.
We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z) - Framework-Based Qualitative Analysis of Free Responses of Large Language
Models: Algorithmic Fidelity [1.7947441434255664]
Large-scale generative Language Models (LLMs) can simulate free responses to interview questions like those traditionally analyzed using qualitative research methods.
Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods.
arXiv Detail & Related papers (2023-09-06T15:00:44Z) - Exploring the psychology of LLMs' Moral and Legal Reasoning [0.0]
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains.
Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues.
We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b.
We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall
arXiv Detail & Related papers (2023-08-02T16:36:58Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Can Large Language Models Be an Alternative to Human Evaluations? [80.81532239566992]
Large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided.
We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation.
arXiv Detail & Related papers (2023-05-03T07:28:50Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.