Related papers: Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code

Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code

URL: http://arxiv.org/abs/2403.07506v1
Date: Tue, 12 Mar 2024 10:43:26 GMT
Title: Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code
Authors: Zhou Yang, Zhensu Sun, Terry Zhuo Yue, Premkumar Devanbu, David Lo
Abstract summary: Large language models for code (LLM4Code) demonstrate strong performance (e.g., high accuracy) in processing source code. This paper thoroughly examines 146 relevant studies to identify seven important properties beyond accuracy, including, security, privacy, explainability, efficiency, and robustness. We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.
Score: 9.343299833972253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional properties of LM4Code, but there is no systematic review of how these properties are evaluated and enhanced. This paper fills this gap by thoroughly examining 146 relevant studies, thereby presenting the first systematic literature review to identify seven important properties beyond accuracy, including robustness, security, privacy, explainability, efficiency, and usability. We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.

Related papers

Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement [5.4044723481768235]
This paper gives a detailed overview of Active Learning (AL), which is a strategy in machine learning that helps models achieve better performance using fewer labeled examples. It introduces the basic concepts of AL and discusses how it is used in various fields such as computer vision, natural language processing, transfer learning, and real-world applications.
arXiv Detail & Related papers (2025-04-21T20:42:13Z)
Towards an Understanding of Context Utilization in Code Intelligence [37.85380387094615]
Code intelligence aims to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs can substantially enhance model performance. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence.
arXiv Detail & Related papers (2025-04-11T17:59:53Z)
Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions. Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks. This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z)
Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++ [0.716879432974126]
We conduct the first empirical study to investigate and compare the performance of Machine Learning (ML) and Deep Learning (DL) models for function-level SV assessment in C/C++. We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time.
arXiv Detail & Related papers (2024-07-24T07:26:58Z)
Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights [10.222207222039048]
Software repositories are rich sources of qualitative artifacts, including source code comments, commit messages, issue descriptions, and documentation. This chapter shifts the focus towards interpreting these artifacts using various qualitative data analysis techniques. Various coding methods are discussed along with the strategic design of a coding guide to ensure consistency and accuracy in data interpretation.
arXiv Detail & Related papers (2024-06-12T13:56:55Z)
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers. To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey [21.01561950216472]
Modern language models (LMs) have been successfully employed in source code generation and understanding. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls.
arXiv Detail & Related papers (2023-10-27T05:32:57Z)
On the Reliability and Explainability of Language Models for Program Generation [15.569926313298337]
We study the capabilities and limitations of automated program generation approaches. We employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code transformation. Our analysis reveals that, in various experimental scenarios, language models can recognize code grammar and structural information, but they exhibit limited robustness to changes in input sequences.
arXiv Detail & Related papers (2023-02-19T14:59:52Z)
Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods [48.47413103662829]
Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models. However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge.
arXiv Detail & Related papers (2022-03-10T08:28:32Z)
Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions [4.409836695738517]
We present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions. We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks.
arXiv Detail & Related papers (2022-01-03T17:17:11Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.