Related papers: An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

URL: http://arxiv.org/abs/2307.02443v1
Date: Wed, 5 Jul 2023 17:13:00 GMT
Title: An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code
Authors: Max Hort and Anastasiia Grishina and Leon Moonen
Abstract summary: This study investigates if publications that trained language models for software engineering tasks share source code and trained artifacts. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks.
Score: 1.0742675209112622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint.

Related papers

In-Context Code-Text Learning for Bimodal Software Engineering [26.0027882745058]
Bimodal software analysis initially appeared to be within reach with the advent of large language models. We postulate that in-context learning for the code-text bimodality is a promising avenue. We consider a diverse dataset encompassing 23 software engineering tasks, which we transform in an in-context learning format.
arXiv Detail & Related papers (2024-10-08T19:42:00Z)
DeepCodeProbe: Towards Understanding What Models Trained on Code Learn [13.135962181354465]
We introduce DeepCodeProbe, a probing approach that examines the syntax and representation learning abilities of ML models. Our study applies DeepCodeProbe to state-of-the-art models for code clone detection, code summarization, and comment generation. Findings reveal that while small models capture abstract syntactic representations, their ability to fully grasp programming language syntax is limited.
arXiv Detail & Related papers (2024-07-11T23:16:44Z)
EduNLP: Towards a Unified and Modularized Library for Educational Resources [78.8523961816045]
We present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage.
arXiv Detail & Related papers (2024-06-03T12:45:40Z)
INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers [7.255653248042546]
We use a framework to define 15 probing tasks that exercise surface, syntactic, structural and semantic characteristics of source code. We probe 8 pre-trained source code models, as well as a natural language model (BERT) as our baseline. We find that models that incorporate some structural information (such as GraphCodeBERT) have a better representation of source code characteristics.
arXiv Detail & Related papers (2023-12-08T15:21:54Z)
NLLB-CLIP -- train performant multilingual image retrieval model on a budget [65.268245109828]
We present NLLB-CLIP - CLIP model with a text encoder from the NLLB model. We used an automatically created dataset of 106,246 good-quality images with captions in 201 languages. We show that NLLB-CLIP is comparable in quality to state-of-the-art models and significantly outperforms them on low-resource languages.
arXiv Detail & Related papers (2023-09-04T23:26:11Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
Active Code Learning: Benchmarking Sample-Efficient Training of Code Models [35.54965391159943]
In software engineering (ML4Code), efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique that allows developers to train a model with reduced data while producing models with desired performance. This paper builds the first benchmark to study this critical problem - active code learning.
arXiv Detail & Related papers (2023-06-02T03:26:11Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models. Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z)
AstBERT: Enabling Language Model for Code Understanding with Abstract Syntax Tree [3.1087379479634927]
We propose the AstBERT model, a pre-trained language model aiming to better understand the programming language (PL) using the abstract syntax tree (AST) Specifically, we collect a colossal amount of source codes (both java and python) from GitHub, in which information of the source codes can be interpreted and integrated. Experiment results show that our AstBERT model achieves state-of-the-art performance on both downstream tasks.
arXiv Detail & Related papers (2022-01-20T03:27:26Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.