Related papers: On the Effectiveness of Large Language Models in Domain-Specific Code Generation

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

URL: http://arxiv.org/abs/2312.01639v3
Date: Thu, 9 May 2024 02:15:05 GMT
Title: On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Authors: Yalan Lin, Meng Chen, Yuhan Hu, Hongyu Zhang, Chengcheng Wan, Zhao Wei, Yong Xu, Juhong Wang, Xiaodong Gu,
Abstract summary: Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. In this paper, we conduct an in-depth study of the LLMs in domain-specific code generation. We investigate how to efficiently incorporate API knowledge into the code generation process.
Score: 20.61882220430463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite the great achievement, they rely on enormous training data to acquire a broad spectrum of open-domain knowledge. Besides, their evaluation revolves around open-domain benchmarks like HumanEval, which primarily consist of programming contests. Therefore, it is hard to fully characterize the intricacies and challenges associated with particular domains (e.g., web, game, and math). In this paper, we conduct an in-depth study of the LLMs in domain-specific code generation. Our results demonstrate that LLMs exhibit sub-optimal performance in generating domain-specific code, due to their limited proficiency in utilizing domain-specific libraries. We further observe that incorporating API knowledge as prompts can empower LLMs to generate more professional code. Based on these findings, we further investigate how to efficiently incorporate API knowledge into the code generation process. We experiment with three strategies for incorporating domain knowledge, namely, external knowledge inquirer, chain-of-thought prompting, and chain-of-thought fine-tuning. We refer to these strategies as a new code generation approach called DomCoder. Experimental results show that all strategies of DomCoder lead to improvement in the effectiveness of domain-specific code generation under certain settings. The results also show that there is still ample room for further improvement, based on which we suggest possible future works.

Related papers

Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs [4.616570111453259]
Large language models (LLMs) struggle with generating entire deep learning projects. We propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects.
arXiv Detail & Related papers (2025-04-21T13:09:25Z)
Top General Performance = Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark [38.14474956762422]
We introduce DomainCodeBench, a benchmark designed to evaluate large language models (LLMs) across 12 software application domains and 15 programming languages. We find that top general-domain models do not consistently excel in specific application domains. We show that augmenting prompts with domain-specific knowledge improves performance by around 38.17%.
arXiv Detail & Related papers (2024-12-24T17:56:08Z)
Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator [33.680619900836376]
We propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator. It generates knowledge-intensive programs according to the domain-specific documents. We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
arXiv Detail & Related papers (2024-12-12T13:42:58Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities. The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z)
Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora. LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z)
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation [48.11754113512047]
This study includes a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains. Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study. The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL.
arXiv Detail & Related papers (2024-08-23T16:33:58Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed. We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer. We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
ARKS: Active Retrieval in Knowledge Soup for Code Generation [18.22108704150575]
We introduce Active Retrieval in Knowledge Soup (ARKS), an advanced strategy for generalizing large language models for code. We employ an active retrieval strategy that iteratively refines the query and updates the knowledge soup. Experimental results on ChatGPT and CodeLlama demonstrate a substantial improvement in the average execution accuracy of ARKS on LLMs.
arXiv Detail & Related papers (2024-02-19T17:37:28Z)
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub [79.31134731122462]
We introduce OpenAct benchmark to evaluate the open-domain task-solving capability, built on human expert consultation and repositories in GitHub.<n>We present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.