Related papers: Exploring Qualitative Research Using LLMs

Exploring Qualitative Research Using LLMs

URL: http://arxiv.org/abs/2306.13298v1
Date: Fri, 23 Jun 2023 05:21:36 GMT
Title: Exploring Qualitative Research Using LLMs
Authors: Muneera Bano, Didar Zowghi, Jon Whittle
Abstract summary: This study aimed to compare and contrast the comprehension capabilities of humans and AI driven large language models. We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. LLMs were then asked to classify these reviews and provide the reasoning behind each classification.
Score: 8.545798128849091
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of AI driven large language models (LLMs) have stirred discussions about their role in qualitative research. Some view these as tools to enrich human understanding, while others perceive them as threats to the core values of the discipline. This study aimed to compare and contrast the comprehension capabilities of humans and LLMs. We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. LLMs were then asked to classify these reviews and provide the reasoning behind each classification. We compared the results with human classification and reasoning. The research indicated a significant alignment between human and ChatGPT 3.5 classifications in one third of cases, and a slightly lower alignment with GPT4 in over a quarter of cases. The two AI models showed a higher alignment, observed in more than half of the instances. However, a consensus across all three methods was seen only in about one fifth of the classifications. In the comparison of human and LLMs reasoning, it appears that human analysts lean heavily on their individual experiences. As expected, LLMs, on the other hand, base their reasoning on the specific word choices found in app reviews and the functional components of the app itself. Our results highlight the potential for effective human LLM collaboration, suggesting a synergistic rather than competitive relationship. Researchers must continuously evaluate LLMs role in their work, thereby fostering a future where AI and humans jointly enrich qualitative research.

Related papers

Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities [0.0]
We compare data generated with mono- and multilingual LLMs spanning a range of model sizes with data provided by human participants. We aim to develop a benchmark to assess the capabilities of LLMs with discourse biases as a robust proxy for more general discourse understanding capabilities.
arXiv Detail & Related papers (2025-01-22T16:07:24Z)
Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI [0.08192907805418582]
This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment.
arXiv Detail & Related papers (2025-01-05T13:18:13Z)
Are Large Language Models Good Essay Graders? [4.134395287621344]
We evaluate Large Language Models (LLMs) in assessing essay quality, focusing on their alignment with human grading. We compare the numeric grade provided by the LLMs to human rater-provided scores utilizing the ASAP dataset. ChatGPT tends to be harsher and further misaligned with human evaluations than Llama.
arXiv Detail & Related papers (2024-09-19T23:20:49Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks. This study focuses on the topic of LLMs assist NLP Researchers. To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z)
Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks. These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences. We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z)
Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity [1.7947441434255664]
Large-scale generative Language Models (LLMs) can simulate free responses to interview questions like those traditionally analyzed using qualitative research methods. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods.
arXiv Detail & Related papers (2023-09-06T15:00:44Z)
Exploring the psychology of LLMs' Moral and Legal Reasoning [0.0]
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall
arXiv Detail & Related papers (2023-08-02T16:36:58Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
Can Large Language Models Be an Alternative to Human Evaluations? [80.81532239566992]
Large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided. We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation.
arXiv Detail & Related papers (2023-05-03T07:28:50Z)
Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.