Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
- URL: http://arxiv.org/abs/2010.01410v1
- Date: Sat, 3 Oct 2020 18:57:26 GMT
- Title: Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
- Authors: David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu
- Abstract summary: We analyze several recent code-comment datasets for this task.
We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators.
We find some interesting differences between the code-comment data and the WMT19 natural language data.
- Score: 49.35567240750619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The relationship of comments to code, and in particular, the task of
generating useful comments given the code, has long been of interest. The
earliest approaches have been based on strong syntactic theories of
comment-structures, and relied on textual templates. More recently, researchers
have applied deep learning methods to this task, and specifically, trainable
generative translation models which are known to work very well for Natural
Language translation (e.g., from German to English). We carefully examine the
underlying assumption here: that the task of generating comments sufficiently
resembles the task of translating between natural languages, and so similar
models and evaluation metrics could be used. We analyze several recent
code-comment datasets for this task: CodeNN, DeepCom, FunCom, and DocString. We
compare them with WMT19, a standard dataset frequently used to train state of
the art natural language translators. We found some interesting differences
between the code-comment data and the WMT19 natural language data. Next, we
describe and conduct some studies to calibrate BLEU (which is commonly used as
a measure of comment quality). using "affinity pairs" of methods, from
different projects, in the same project, in the same class, etc; Our study
suggests that the current performance on some datasets might need to be
improved substantially. We also argue that fairly naive information retrieval
(IR) methods do well enough at this task to be considered a reasonable
baseline. Finally, we make some suggestions on how our findings might be used
in future research in this area.
Related papers
- Leveraging Generative AI: Improving Software Metadata Classification
with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration.
This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful"
We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z) - Language Models are Universal Embedders [48.12992614723464]
We show that pre-trained transformer decoders can embed universally when finetuned on limited English data.
Our models achieve competitive performance on different embedding tasks by minimal training data.
These results provide evidence of a promising path towards building powerful unified embedders.
arXiv Detail & Related papers (2023-10-12T11:25:46Z) - Constructing Multilingual Code Search Dataset Using Neural Machine
Translation [48.32329232202801]
We create a multilingual code search dataset in four natural and four programming languages.
Our results show that the model pre-trained with all natural and programming language data has performed best in most cases.
arXiv Detail & Related papers (2023-06-27T16:42:36Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - CoDesc: A Large Code-Description Parallel Dataset [4.828053113572208]
We present CoDesc -- a large parallel dataset composed of 4.2 million Java methods and natural language descriptions.
With extensive analysis, we identify and remove prevailing noise patterns from the dataset.
We show that the dataset helps improve code search by up to 22% and achieves the new state-of-the-art in code summarization.
arXiv Detail & Related papers (2021-05-29T05:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.