Related papers: Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences

Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences

URL: http://arxiv.org/abs/2507.21831v1
Date: Tue, 29 Jul 2025 14:10:31 GMT
Title: Introducing HALC: A general pipeline for finding optimal prompting strategies for automated coding with LLMs in the computational social sciences
Authors: Andreas Reich, Claudia Thoms, Tobias Schrimpf,
Abstract summary: We propose HALC$-$a general pipeline that allows for the systematic and reliable construction of optimal prompts for any given coding task and model.<n>Our paper provides insights into the effectiveness of different prompting strategies, crucial influencing factors, and the identification of reliable prompts for each coding task and model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs are seeing widespread use for task automation, including automated coding in the social sciences. However, even though researchers have proposed different prompting strategies, their effectiveness varies across LLMs and tasks. Often trial and error practices are still widespread. We propose HALC$-$a general pipeline that allows for the systematic and reliable construction of optimal prompts for any given coding task and model, permitting the integration of any prompting strategy deemed relevant. To investigate LLM coding and validate our pipeline, we sent a total of 1,512 individual prompts to our local LLMs in over two million requests. We test prompting strategies and LLM task performance based on few expert codings (ground truth). When compared to these expert codings, we find prompts that code reliably for single variables (${\alpha}$climate = .76; ${\alpha}$movement = .78) and across two variables (${\alpha}$climate = .71; ${\alpha}$movement = .74) using the LLM Mistral NeMo. Our prompting strategies are set up in a way that aligns the LLM to our codebook$-$we are not optimizing our codebook for LLM friendliness. Our paper provides insights into the effectiveness of different prompting strategies, crucial influencing factors, and the identification of reliable prompts for each coding task and model.

Related papers

On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization [54.965787768076254]
Large Language Models have been recently exploited as judges for complex natural language processing tasks, such as Q&A.<n>We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely code generation and code summarization.
arXiv Detail & Related papers (2025-07-22T13:40:26Z)
The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation [17.064672221710307]
Large Language Models (LLMs) can generate useful test cases for given source code.<n>The existing work primarily relies on human-written plain prompts.
arXiv Detail & Related papers (2025-01-02T16:30:05Z)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries. We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT) LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
A Prompt Learning Framework for Source Code Summarization [19.24919436211323]
This paper proposes an effective prompt learning framework for code summarization called PromptCS.<n>PromptCS trains a prompt agent that can generate continuous prompts to unleash the potential for large language models in code summarization.
arXiv Detail & Related papers (2023-12-26T14:37:55Z)
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code [11.194047962236793]
We present a method for evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence.<n>Turbulence consists of a large set of natural language $textitquestion templates$, each of which is a programming problem, parameterised so that it can be asked in many different forms.<n>From a single question template, it is possible to ask an LLM a $textitneighbourhood$ of very similar programming questions, and assess the correctness of the result returned for each question.
arXiv Detail & Related papers (2023-12-22T17:29:08Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures. This work identifies three major factors contributing to this length generalization failure. We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z)
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes [54.13559879916708]
EVAPORATE is a prototype system powered by large language models (LLMs)<n>Code synthesis is cheap, but far less accurate than directly processing each document with the LLM.<n>We propose an extended code implementation, EVAPORATE-CODE+, which achieves better quality than direct extraction.
arXiv Detail & Related papers (2023-04-19T06:00:26Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.