Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence
- URL: http://arxiv.org/abs/2309.14379v2
- Date: Sun, 20 Oct 2024 12:00:19 GMT
- Title: Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence
- Authors: Andres Karjus,
- Abstract summary: Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability.
The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
- Score: 0.0
- License:
- Abstract: The increasing capacities of large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, by automating complex qualitative tasks otherwise typically carried out by human researchers. While numerous benchmarking studies have assessed the analytic prowess of LLMs, there is less focus on operationalizing this capacity for inference and hypothesis testing. Addressing this challenge, a systematic framework is argued for here, building on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. Replicability and statistical robustness are discussed, including how to incorporate machine annotator error rates in subsequent inference. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks, including analysis of themes, stances, ideas, and genre compositions; linguistic and semantic annotation, interviews, text mining and event cause inference in noisy historical data, literary social network construction, metadata imputation, and multimodal visual cultural analytics. Using hypothesis-driven topic classification instead of "distant reading" is discussed. The replications among the experiments also illustrate how tasks previously requiring protracted team effort or complex computational pipelines can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, the approach is not intended to replace, but to augment and scale researcher expertise and analytic practices. With these opportunities in sight, qualitative skills and the ability to pose insightful questions have arguably never been more critical.
Related papers
- Large Language Model for Qualitative Research -- A Systematic Mapping Study [3.302912592091359]
Large Language Models (LLMs), powered by advanced generative AI, have emerged as transformative tools.
This study systematically maps the literature on the use of LLMs for qualitative research.
Findings reveal that LLMs are utilized across diverse fields, demonstrating the potential to automate processes.
arXiv Detail & Related papers (2024-11-18T21:28:00Z) - Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Automating Thematic Analysis: How LLMs Analyse Controversial Topics [5.025737475817937]
Large Language Models (LLMs) are promising analytical tools.
This paper explores how LLMs can support thematic analysis of controversial topics.
Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents.
arXiv Detail & Related papers (2024-05-11T05:28:25Z) - QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums [10.684484559041284]
This study introduces QuaLLM, a novel framework to analyze and extract quantitative insights from text data on online forums.
We applied this framework to analyze over one million comments from two Reddit's rideshare worker communities.
arXiv Detail & Related papers (2024-05-08T18:20:03Z) - Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted
Approach for Qualitative Data Analysis [6.592797748561459]
Large Language Models (LLMs) have enabled collaborative human-bot interactions in Software Engineering (SE)
We introduce a new dimension of scalability and accuracy in qualitative research, potentially transforming data interpretation methodologies in SE.
arXiv Detail & Related papers (2024-02-02T13:10:46Z) - Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language.
This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - AR-LSAT: Investigating Analytical Reasoning of Text [57.1542673852013]
We study the challenge of analytical reasoning of text and introduce a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016.
We analyze what knowledge understanding and reasoning abilities are required to do well on this task.
arXiv Detail & Related papers (2021-04-14T02:53:32Z) - Automatic coding of students' writing via Contrastive Representation
Learning in the Wasserstein space [6.884245063902909]
This work is a step towards building a statistical machine learning (ML) method for supporting qualitative analyses of students' writing.
We show that the ML algorithm approached the inter-rater reliability of human analysis.
arXiv Detail & Related papers (2020-11-26T16:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.