SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention
- URL: http://arxiv.org/abs/2502.10937v1
- Date: Sun, 16 Feb 2025 00:19:07 GMT
- Title: SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention
- Authors: Chengshuai Zhao, Zhen Tan, Chau-Wai Wong, Xinyan Zhao, Tianlong Chen, Huan Liu,
- Abstract summary: We introduce a novel multi-agent framework that effectively.
imulates $underlinetextbfC$ontent $underlinetextbfA$nalysis via.
underlinetextbfL$arge language model (LLM) agunderlinetextbfE$nts.
It imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution.
- Score: 50.07342730395946
- License:
- Abstract: Content analysis breaks down complex and unstructured texts into theory-informed numerical categories. Particularly, in social science, this process usually relies on multiple rounds of manual annotation, domain expert discussion, and rule-based refinement. In this paper, we introduce SCALE, a novel multi-agent framework that effectively $\underline{\textbf{S}}$imulates $\underline{\textbf{C}}$ontent $\underline{\textbf{A}}$nalysis via $\underline{\textbf{L}}$arge language model (LLM) ag$\underline{\textbf{E}}$nts. SCALE imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution, capturing the reflective depth and adaptive discussions of human researchers. Furthermore, by integrating diverse modes of human intervention, SCALE is augmented with expert input to further enhance its performance. Extensive evaluations on real-world datasets demonstrate that SCALE achieves human-approximated performance across various complex content analysis tasks, offering an innovative potential for future social science research.
Related papers
- GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research [2.1728621449144763]
Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science.
Traditional methods, relying on keyword searches, often fail to uncover valuable insights not explicitly stated in article titles or keywords.
We leverage Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis.
arXiv Detail & Related papers (2024-10-08T05:13:27Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums [10.684484559041284]
This study introduces QuaLLM, a novel framework to analyze and extract quantitative insights from text data on online forums.
We applied this framework to analyze over one million comments from two of Reddit's rideshare worker communities.
We uncover significant worker concerns regarding AI and algorithmic platform decisions, responding to regulatory calls about worker insights.
arXiv Detail & Related papers (2024-05-08T18:20:03Z) - Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for
Inferring Online Health Texts [0.0]
We employ Abstract Representation (AMR) graphs as a means to model low-resource Health NLP tasks.
AMRs are well suited to model online health texts as they represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships.
Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings.
arXiv Detail & Related papers (2023-09-18T15:37:30Z) - Sequential annotations for naturally-occurring HRI: first insights [0.0]
We explain the methodology we developed for improving the interactions accomplished by an embedded conversational agent.
We are creating a corpus of naturally-occurring interactions that will be made available to the community.
arXiv Detail & Related papers (2023-08-29T08:07:26Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.