Related papers: How to use LLMs for Text Analysis

How to use LLMs for Text Analysis

URL: http://arxiv.org/abs/2307.13106v1
Date: Mon, 24 Jul 2023 19:54:15 GMT
Title: How to use LLMs for Text Analysis
Authors: Petter T\"ornberg
Abstract summary: This guide introduces Large Language Models (LLM) as a highly versatile text analysis method within the social sciences. As LLMs are easy-to-use, cheap, fast, and applicable on a broad range of text analysis tasks, many scholars believe that LLMs will transform how we do text analysis.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This guide introduces Large Language Models (LLM) as a highly versatile text analysis method within the social sciences. As LLMs are easy-to-use, cheap, fast, and applicable on a broad range of text analysis tasks, ranging from text annotation and classification to sentiment analysis and critical discourse analysis, many scholars believe that LLMs will transform how we do text analysis. This how-to guide is aimed at students and researchers with limited programming experience, and offers a simple introduction to how LLMs can be used for text analysis in your own research project, as well as advice on best practices. We will go through each of the steps of analyzing textual data with LLMs using Python: installing the software, setting up the API, loading the data, developing an analysis prompt, analyzing the text, and validating the results. As an illustrative example, we will use the challenging task of identifying populism in political texts, and show how LLMs move beyond the existing state-of-the-art.

Related papers

Extract Information from Hybrid Long Documents Leveraging LLMs: A Framework and Dataset [52.286323454512996]
Large Language Models (LLMs) can comprehend and analyze hybrid text, containing textual and tabular data. We propose an Automated Information Extraction framework (AIE) to enable LLMs to process the hybrid long documents (HLDs) and carry out experiments to analyse four important aspects of information extraction from HLDs. To address the issue of dataset scarcity in HLDs and support future work, we also propose the Financial Reports Numerical Extraction (FINE) dataset.
arXiv Detail & Related papers (2024-12-28T07:54:14Z)
CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts [9.682499180341273]
Large language models (LLMs) have greatly enhanced text generation across industries. Their human-like outputs make distinguishing between human and AI authorship challenging. Current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors.
arXiv Detail & Related papers (2024-06-13T12:43:40Z)
ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts. In this paper, we identify the common characteristics shared by these models. We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z)
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG. InFO-RAG is low-cost and general across various tasks. It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z)
The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks [3.848607479075651]
We investigate the role that current Large Language Models (LLMs) can play in improving callgraph analysis and type inference for Python programs. Our study reveals that LLMs show promising results in type inference, demonstrating higher accuracy than traditional methods, yet they exhibit limitations in callgraph analysis.
arXiv Detail & Related papers (2024-02-27T16:53:53Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Large Language Models for Conducting Advanced Text Analytics Information Systems Research [4.913568041651961]
Large Language Models (LLMs) have emerged as tools that are capable of processing and extracting insights from massive unstructured textual datasets. We propose a Text Analytics for Information Systems Research (TAISR) framework to assist the Information Systems community in understanding how to operationalize LLMs.
arXiv Detail & Related papers (2023-12-27T19:49:00Z)
Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents [19.65846717628022]
Large language models (LLMs) promise automation with better results and less programming. In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings. We find that the best prompting strategy consists of providing the LLMs with a detailed codebook, as the one provided to human coders.
arXiv Detail & Related papers (2023-11-20T15:34:45Z)
LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis [18.775126929754833]
Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. Human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL)
arXiv Detail & Related papers (2023-10-23T17:05:59Z)
Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.