Related papers: Neural Code Summarization: How Far Are We?

Neural Code Summarization: How Far Are We?

URL: http://arxiv.org/abs/2107.07112v1
Date: Thu, 15 Jul 2021 04:33:59 GMT
Title: Neural Code Summarization: How Far Are We?
Authors: Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun
Abstract summary: Deep learning techniques have been exploited to automatically generate summaries for given code snippets. In this paper, we conduct a systematic and in-depth analysis of five state-of-the-art neural source code summarization models.
Score: 30.324396716447602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Source code summaries are important for the comprehension and maintenance of programs. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem, in this paper, we conduct a systematic and in-depth analysis of five state-of-the-art neural source code summarization models on three widely used datasets. Our evaluation results suggest that: (1) The BLEU metric, which is widely used by existing work for evaluating the performance of the summarization models, has many variants. Ignoring the differences among the BLEU variants could affect the validity of the claimed results. Furthermore, we discover an important, previously unknown bug about BLEU calculation in a commonly-used software package. (2) Code pre-processing choices can have a large impact on the summarization performance, therefore they should not be ignored. (3) Some important characteristics of datasets (corpus size, data splitting method, and duplication ratio) have a significant impact on model evaluation. Based on the experimental results, we give some actionable guidelines on more systematic ways for evaluating code summarization and choosing the best method in different scenarios. We also suggest possible future research directions. We believe that our results can be of great help for practitioners and researchers in this interesting area.

Related papers

Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough? [11.865113785648932]
We explore the extent to which code-comment coherence, a specific quality attribute of code summaries, can be used to optimize code summarization datasets. We examine multiple levels of training instances from two state-of-the-art datasets (TL-CodeSum and Funcom) and evaluate the resulting models on three manually curated test sets.
arXiv Detail & Related papers (2025-02-11T15:02:19Z)
Rethinking Negative Pairs in Code Search [56.23857828689406]
We propose a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. We analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation.
arXiv Detail & Related papers (2023-10-12T06:32:42Z)
DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables. We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels. We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
On the State of German (Abstractive) Text Summarization [3.1776833268555134]
We assess the landscape of German abstractive text summarization. We investigate why practically useful solutions for abstractive text summarization are still absent in industry.
arXiv Detail & Related papers (2023-01-17T18:59:20Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
On the Evaluation of Commit Message Generation Models: An Experimental Study [33.19314967188712]
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. This paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets.
arXiv Detail & Related papers (2021-07-12T12:38:02Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation [13.804332504576301]
We group the various methods proposed in the literature into three general classes of algorithms (or metalearners) We show analytically and empirically that optimizing for the prediction of outcomes or causal effects is not the same as optimizing for treatment assignments. This is the first comparison of the three different metalearners on a real-world application at scale.
arXiv Detail & Related papers (2020-04-24T04:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.