A Survey on Pretrained Language Models for Neural Code Intelligence
- URL: http://arxiv.org/abs/2212.10079v1
- Date: Tue, 20 Dec 2022 08:34:56 GMT
- Title: A Survey on Pretrained Language Models for Neural Code Intelligence
- Authors: Yichen Xu and Yanqiao Zhu
- Abstract summary: The field of Neural Code Intelligence (NCI) has emerged as a promising solution to tackle analytical tasks on source code.
NCI aims to improve programming efficiency and minimize human errors within the software industry.
Pretrained language models have become a dominant force in NCI research, consistently delivering state-of-the-art results.
- Score: 4.020523898765404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the complexity of modern software continues to escalate, software
engineering has become an increasingly daunting and error-prone endeavor. In
recent years, the field of Neural Code Intelligence (NCI) has emerged as a
promising solution, leveraging the power of deep learning techniques to tackle
analytical tasks on source code with the goal of improving programming
efficiency and minimizing human errors within the software industry. Pretrained
language models have become a dominant force in NCI research, consistently
delivering state-of-the-art results across a wide range of tasks, including
code summarization, generation, and translation. In this paper, we present a
comprehensive survey of the NCI domain, including a thorough review of
pretraining techniques, tasks, datasets, and model architectures. We hope this
paper will serve as a bridge between the natural language and programming
language communities, offering insights for future research in this rapidly
evolving field.
Related papers
- Large Language Models in Computer Science Education: A Systematic Literature Review [7.240148550817106]
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP)
Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-10-21T17:49:50Z) - From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education [2.932399587069876]
This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT)
CodeLKT is an innovative application of Language model-based Knowledge Tracing (LKT) to programming education.
arXiv Detail & Related papers (2024-08-31T01:36:38Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - The Future of Scientific Publishing: Automated Article Generation [0.0]
This study introduces a novel software tool leveraging large language model (LLM) prompts, designed to automate the generation of academic articles from Python code.
Python served as a foundational proof of concept; however, the underlying methodology and framework exhibit adaptability across various GitHub repo's.
The development was achieved without reliance on advanced language model agents, ensuring high fidelity in the automated generation of coherent and comprehensive academic content.
arXiv Detail & Related papers (2024-04-11T16:47:02Z) - A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence.
It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works.
Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z) - On-the-Fly Syntax Highlighting: Generalisation and Speed-ups [2.208443815105053]
On-the-fly syntax highlighting is the task of rapidly associating visual secondary notation values with each character of a language derivation.
Speed constraints are essential to ensure tool usability, manifesting as responsiveness for end users accessing online source code.
achieving precise highlighting is critical for enhancing code comprehensibility.
addressing the development costs of such resolvers is imperative, given the multitude of programming language versions.
arXiv Detail & Related papers (2024-02-13T19:43:22Z) - Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit [63.82016263181941]
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora.
Currently, there is already a thriving research community focusing on code intelligence.
arXiv Detail & Related papers (2023-12-30T17:48:37Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Natural Language Generation and Understanding of Big Code for
AI-Assisted Programming: A Review [9.355153561673855]
This paper focuses on transformer-based large language models (LLMs) trained using Big Code.
LLMs have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection.
It explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications.
arXiv Detail & Related papers (2023-07-04T21:26:51Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.