QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums
- URL: http://arxiv.org/abs/2405.05345v1
- Date: Wed, 8 May 2024 18:20:03 GMT
- Title: QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums
- Authors: Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dan Calacci, Andrés Monroy-Hernández,
- Abstract summary: This study introduces QuaLLM, a novel framework to analyze and extract quantitative insights from text data on online forums.
We applied this framework to analyze over one million comments from two Reddit's rideshare worker communities.
- Score: 10.684484559041284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online discussion forums provide crucial data to understand the concerns of a wide range of real-world communities. However, the typical qualitative and quantitative methods used to analyze those data, such as thematic analysis and topic modeling, are infeasible to scale or require significant human effort to translate outputs to human readable forms. This study introduces QuaLLM, a novel LLM-based framework to analyze and extract quantitative insights from text data on online forums. The framework consists of a novel prompting methodology and evaluation strategy. We applied this framework to analyze over one million comments from two Reddit's rideshare worker communities, marking the largest study of its type. We uncover significant worker concerns regarding AI and algorithmic platform decisions, responding to regulatory calls about worker insights. In short, our work sets a new precedent for AI-assisted quantitative data analysis to surface concerns from online forums.
Related papers
- Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs [0.41436032949434404]
We develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets.
We show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue.
arXiv Detail & Related papers (2024-08-19T07:14:15Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM [16.488296856867937]
We introduce concept induction, a computational process that produces high-level concepts from unstructured text.
We present LLooM, a concept induction algorithm that leverages large language models to iteratively synthesize sampled text.
We find that LLooM's concepts improve upon the prior art of topic models in terms of quality and data coverage.
arXiv Detail & Related papers (2024-04-18T15:26:02Z) - Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability.
The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z) - Can Large Language Models emulate an inductive Thematic Analysis of
semi-structured interviews? An exploration and provocation on the limits of
the approach and the model [0.0]
The paper presents results and reflection of an experiment done to use the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic Analysis.
The objective of the paper is not to replace human analysts in qualitative analysis but to learn if some elements of LLM data manipulation can to an extent be of support for qualitative research.
arXiv Detail & Related papers (2023-05-22T13:16:07Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.