Related papers: Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild

Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild

URL: http://arxiv.org/abs/2512.10493v2
Date: Fri, 12 Dec 2025 12:10:18 GMT
Title: Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild
Authors: Binquan Zhang, Li Zhang, Haoyuan Zhang, Fang Liu, Song Wang, Bo Shen, An Fu, Lin Shi,
Abstract summary: We conduct an empirical analysis on human-LLM coding collaboration using LMSYS-Chat-1M and WildChat datasets.<n>We find that task types shape interaction patterns, with code quality optimization favoring linear patterns, design-driven tasks leaning toward tree structures, and queries preferring star patterns.<n>We believe this work broadens understanding of human-LLM synergies and supports more effective AI-assisted development.
Score: 15.241064679369407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly acting as dynamic conversational interfaces, supporting multi-turn interactions that mimic human-like conversation and facilitate complex tasks like coding. While datasets such as LMSYS-Chat-1M and WildChat capture real-world user-LLM conversations, few studies systematically explore the mechanisms of human-LLM collaboration in coding scenarios. What tortuous paths do users experience during the interaction process? How well do the LLMs follow instructions? Are users satisfied? In this paper, we conduct an empirical analysis on human-LLM coding collaboration using LMSYS-Chat-1M and WildChat datasets to explore the human-LLM collaboration mechanism, LLMs' instruction following ability, and human satisfaction. This study yields interesting findings: 1) Task types shape interaction patterns(linear, star and tree), with code quality optimization favoring linear patterns, design-driven tasks leaning toward tree structures, and queries preferring star patterns; 2) Bug fixing and code refactoring pose greater challenges to LLMs' instruction following, with non-compliance rates notably higher than in information querying; 3) Code quality optimization and requirements-driven development tasks show lower user satisfaction, whereas structured knowledge queries and algorithm designs yield higher levels. These insights offer recommendations for improving LLM interfaces and user satisfaction in coding collaborations, while highlighting avenues for future research on adaptive dialogue systems. We believe this work broadens understanding of human-LLM synergies and supports more effective AI-assisted development.

Related papers

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z)
AI-Guided Exploration of Large-Scale Codebases [0.0]
Large language models (LLMs) offer new opportunities to enhance code exploration.<n>Recent advancements in large language models (LLMs) offer new opportunities to enhance code exploration.<n>This work introduces a hybrid approach that integrates reverse engineering with LLM-guided, intent-aware visual exploration.
arXiv Detail & Related papers (2025-08-07T19:15:37Z)
Evaluating the Effectiveness of Large Language Models in Solving Simple Programming Tasks: A User-Centered Study [1.0467092641687232]
This study investigates how different interaction styles with ChatGPT-4o affect user performance on simple programming tasks.<n>I conducted a within-subjects experiment where fifteen high school students completed three problems under three distinct versions of the model.
arXiv Detail & Related papers (2025-07-05T13:52:31Z)
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search [48.348209577994865]
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments.<n>We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning.<n>Our agent learns to extract task-critical atomic facts'' from its interaction trajectories.
arXiv Detail & Related papers (2025-06-10T18:36:31Z)
Conversational AI as a Coding Assistant: Understanding Programmers' Interactions with and Expectations from Large Language Models for Coding [5.064404027153094]
Conversational AI interfaces powered by large language models (LLMs) are increasingly used as coding assistants.<n>This study investigates programmers' usage patterns, perceptions, and interaction strategies when engaging with LLM-driven coding assistants.
arXiv Detail & Related papers (2025-03-14T15:06:07Z)
Analysis of Student-LLM Interaction in a Software Engineering Project [1.2233362977312945]
We analyze 126 undergraduate students' interaction with an AI assistant during a 13-week semester to understand the benefits of AI for software engineering learning.<n>Our findings suggest that students prefer ChatGPT over CoPilot.<n> conversational-based interaction helps improve the quality of the code generated compared to auto-generated code.
arXiv Detail & Related papers (2025-02-03T11:44:00Z)
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks [0.850206009406913]
Large Language Models (LLMs) are transforming programming practices, offering significant capabilities for code generation activities. This paper focuses on their use in programming tasks, drawing insights from user studies that assess the impact of LLMs on programming tasks.
arXiv Detail & Related papers (2024-10-01T19:34:46Z)
CoMMIT: Coordinated Multimodal Instruction Tuning [90.1532838391285]
Multimodal large language models (MLLMs) generally involve cooperative learning between a backbone LLM and a feature encoder of non-text input modalities.<n>In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.<n>We propose a Multimodal Balance Coefficient that enables quantitative measurement of the balance of learning.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games [21.639516389561837]
We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs)<n>WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding.<n>We present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs.
arXiv Detail & Related papers (2024-04-26T19:07:30Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.