Robustness, Security, Privacy, Explainability, Efficiency, and Usability
of Large Language Models for Code
- URL: http://arxiv.org/abs/2403.07506v1
- Date: Tue, 12 Mar 2024 10:43:26 GMT
- Title: Robustness, Security, Privacy, Explainability, Efficiency, and Usability
of Large Language Models for Code
- Authors: Zhou Yang, Zhensu Sun, Terry Zhuo Yue, Premkumar Devanbu, David Lo
- Abstract summary: Large language models for code (LLM4Code) demonstrate strong performance (e.g., high accuracy) in processing source code.
This paper thoroughly examines 146 relevant studies to identify seven important properties beyond accuracy, including, security, privacy, explainability, efficiency, and robustness.
We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.
- Score: 9.343299833972253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models for code (LLM4Code), which demonstrate strong
performance (e.g., high accuracy) in processing source code, have significantly
transformed software engineering. Many studies separately investigate the
non-functional properties of LM4Code, but there is no systematic review of how
these properties are evaluated and enhanced. This paper fills this gap by
thoroughly examining 146 relevant studies, thereby presenting the first
systematic literature review to identify seven important properties beyond
accuracy, including robustness, security, privacy, explainability, efficiency,
and usability. We discuss the current state-of-the-art methods and trends,
identify gaps in existing research, and present promising directions for future
study.
Related papers
- Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement [5.4044723481768235]
This paper gives a detailed overview of Active Learning (AL), which is a strategy in machine learning that helps models achieve better performance using fewer labeled examples.
It introduces the basic concepts of AL and discusses how it is used in various fields such as computer vision, natural language processing, transfer learning, and real-world applications.
arXiv Detail & Related papers (2025-04-21T20:42:13Z) - Towards an Understanding of Context Utilization in Code Intelligence [37.85380387094615]
Code intelligence aims to improve the effectiveness and efficiency of various code-related tasks.
Recent research suggests that incorporating contextual information beyond the basic original task inputs can substantially enhance model performance.
Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence.
arXiv Detail & Related papers (2025-04-11T17:59:53Z) - Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions.
Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z) - Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.
This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z) - Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++ [0.716879432974126]
We conduct the first empirical study to investigate and compare the performance of Machine Learning (ML) and Deep Learning (DL) models for function-level SV assessment in C/C++.
We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time.
arXiv Detail & Related papers (2024-07-24T07:26:58Z) - Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights [10.222207222039048]
Software repositories are rich sources of qualitative artifacts, including source code comments, commit messages, issue descriptions, and documentation.
This chapter shifts the focus towards interpreting these artifacts using various qualitative data analysis techniques.
Various coding methods are discussed along with the strategic design of a coding guide to ensure consistency and accuracy in data interpretation.
arXiv Detail & Related papers (2024-06-12T13:56:55Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey [21.01561950216472]
Modern language models (LMs) have been successfully employed in source code generation and understanding.
Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls.
arXiv Detail & Related papers (2023-10-27T05:32:57Z) - On the Reliability and Explainability of Language Models for Program
Generation [15.569926313298337]
We study the capabilities and limitations of automated program generation approaches.
We employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code transformation.
Our analysis reveals that, in various experimental scenarios, language models can recognize code grammar and structural information, but they exhibit limited robustness to changes in input sequences.
arXiv Detail & Related papers (2023-02-19T14:59:52Z) - Faithfulness in Natural Language Generation: A Systematic Survey of
Analysis, Evaluation and Optimization Methods [48.47413103662829]
Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models.
However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge.
arXiv Detail & Related papers (2022-03-10T08:28:32Z) - Robust Natural Language Processing: Recent Advances, Challenges, and
Future Directions [4.409836695738517]
We present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions.
We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks.
arXiv Detail & Related papers (2022-01-03T17:17:11Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.