Related papers: A Study of LLMs' Preferences for Libraries and Programming Languages

A Study of LLMs' Preferences for Libraries and Programming Languages

URL: http://arxiv.org/abs/2503.17181v2
Date: Mon, 21 Jul 2025 12:58:26 GMT
Title: A Study of LLMs' Preferences for Libraries and Programming Languages
Authors: Lukas Twist, Jie M. Zhang, Mark Harman, Don Syme, Joost Noppen, Helen Yannakoudakis, Detlef Nauck,
Abstract summary: This study is the first empirical study of Large Language Models' preferences for libraries and programming languages when generating code.<n>Our results reveal that LLMs exhibit a strong tendency to overuse widely adopted libraries such as NumPy.<n>For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used a single time.
Score: 19.688657440697632
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly used to generate code, influencing users' choices of libraries and programming languages in critical real-world projects. However, little is known about their systematic biases or preferences toward certain libraries and programming languages, which can significantly impact software development practices. To fill this gap, we perform the first empirical study of LLMs' preferences for libraries and programming languages when generating code, covering eight diverse LLMs. Our results reveal that LLMs exhibit a strong tendency to overuse widely adopted libraries such as NumPy; in up to 48% of cases, this usage is unnecessary and deviates from the ground-truth solutions. LLMs also exhibit a significant preference toward Python as their default language. For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used a single time. These results indicate that LLMs may prioritise familiarity and popularity over suitability and task-specific optimality. This will introduce security vulnerabilities and technical debt, and limit exposure to newly developed, better-suited tools and languages. Understanding and addressing these biases is essential for the responsible integration of LLMs into software development workflows.

Related papers

The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion [4.215010577170175]
We evaluate the confidence of Large Language Models (LLMs) when generating code by measuring code perplexity.<n>We find that strongly-typed languages exhibit lower perplexity than dynamically typed languages.<n> Perl appears universally high in perplexity, whereas Java appears low.
arXiv Detail & Related papers (2025-08-22T06:51:13Z)
Large Language Model Unlearning for Source Code [65.42425213605114]
PROD is a novel unlearning approach that enables LLMs to forget undesired code content while preserving their code generation capabilities.<n>Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches.
arXiv Detail & Related papers (2025-06-20T16:27:59Z)
Evaluating Programming Language Confusion [6.462594894731934]
Large Language Models for code (Code LLMs) have gained significant traction in software engineering.<n>These models have demonstrated remarkable capabilities in understanding programming concepts, implementing algorithms, and even bridging different programming languages.<n>Despite these advances, Code LLMs often struggle with programming language confusion--producing code in unintended languages.
arXiv Detail & Related papers (2025-03-17T18:14:15Z)
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval [7.33924106492889]
Existing code generation benchmarks are designed to study Large Language Models' end-to-end performance.<n>We construct PseudoEval, a multilingual code generation benchmark that provides a solution written in pseudocode as input.<n>Our study indicates that problem-solving capability may transfer across programming languages, while language-coding needs more language-specific effort.
arXiv Detail & Related papers (2025-02-26T14:08:17Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities. The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z)
Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights [9.414198519543564]
We present codellm-devkit (hereafter, CLDK'), an open-source library that significantly simplifies the process of performing program analysis. CLDK offers developers an intuitive and user-friendly interface, making it incredibly easy to provide rich program analysis context to code LLMs.
arXiv Detail & Related papers (2024-10-16T20:05:59Z)
Leveraging Open-Source Large Language Models for Native Language Identification [1.6267479602370543]
Native Language Identification (NLI) has applications in forensics, marketing, and second language acquisition.<n>This study explores the potential of using open-source generative large language models (LLMs) for NLI.
arXiv Detail & Related papers (2024-09-15T08:14:18Z)
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages. It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total. Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z)
Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed. We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Learning Transfers over Several Programming Languages [5.350495525141013]
Cross-lingual transfer uses data from a source language to improve model performance on a target language. This paper reports extensive experiments on four tasks using a transformer-based large language model and 11 to 41 programming languages. We find that learning transfers well across several programming languages.
arXiv Detail & Related papers (2023-10-25T19:04:33Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
The potential of LLMs for coding with low-resource and domain-specific programming languages [0.0]
This study focuses on the econometric scripting language named hansl of the open-source software gretl. Our findings suggest that LLMs can be a useful tool for writing, understanding, improving, and documenting gretl code.
arXiv Detail & Related papers (2023-07-24T17:17:13Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.