Related papers: Code to Comment "Translation": Data, Metrics, Baselining & Evaluation

Code to Comment "Translation": Data, Metrics, Baselining & Evaluation

URL: http://arxiv.org/abs/2010.01410v1
Date: Sat, 3 Oct 2020 18:57:26 GMT
Title: Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
Authors: David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu
Abstract summary: We analyze several recent code-comment datasets for this task. We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators. We find some interesting differences between the code-comment data and the WMT19 natural language data.
Score: 49.35567240750619
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep learning methods to this task, and specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CodeNN, DeepCom, FunCom, and DocString. We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators. We found some interesting differences between the code-comment data and the WMT19 natural language data. Next, we describe and conduct some studies to calibrate BLEU (which is commonly used as a measure of comment quality). using "affinity pairs" of methods, from different projects, in the same project, in the same class, etc; Our study suggests that the current performance on some datasets might need to be improved substantially. We also argue that fairly naive information retrieval (IR) methods do well enough at this task to be considered a reasonable baseline. Finally, we make some suggestions on how our findings might be used in future research in this area.

Related papers

Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks [18.4734091090676]
We investigate whether replacing human-written comments with large language models improves pre-training datasets. Results show that LLM-generated comments are more semantically consistent with code than human-written ones.
arXiv Detail & Related papers (2025-04-28T03:16:34Z)
Specification-Driven Code Translation Powered by Large Language Models: How Far Are We? [8.534857249221844]
We investigate using NL-specification as an intermediate representation for code translation. Our results show that using NL-specification alone does not lead to performance improvements. Besides analyzing the performance of code translation, we also investigate the quality of the translated code.
arXiv Detail & Related papers (2024-12-05T20:10:21Z)
Leveraging Generative AI: Improving Software Metadata Classification with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration. This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful" We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z)
Language Models are Universal Embedders [48.12992614723464]
We show that pre-trained transformer decoders can embed universally when finetuned on limited English data. Our models achieve competitive performance on different embedding tasks by minimal training data. These results provide evidence of a promising path towards building powerful unified embedders.
arXiv Detail & Related papers (2023-10-12T11:25:46Z)
Constructing Multilingual Code Search Dataset Using Neural Machine Translation [48.32329232202801]
We create a multilingual code search dataset in four natural and four programming languages. Our results show that the model pre-trained with all natural and programming language data has performed best in most cases.
arXiv Detail & Related papers (2023-06-27T16:42:36Z)
Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. We design a simple but effective ensemble-based framework that combines various transfer learning techniques. We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Using Document Similarity Methods to create Parallel Datasets for Code Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task. We propose to use document similarity methods to create noisy parallel datasets of code. We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z)
CoDesc: A Large Code-Description Parallel Dataset [4.828053113572208]
We present CoDesc -- a large parallel dataset composed of 4.2 million Java methods and natural language descriptions. With extensive analysis, we identify and remove prevailing noise patterns from the dataset. We show that the dataset helps improve code search by up to 22% and achieves the new state-of-the-art in code summarization.
arXiv Detail & Related papers (2021-05-29T05:40:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.