Related papers: One-off Events? An Empirical Study of Hackathon Code Creation and Reuse

One-off Events? An Empirical Study of Hackathon Code Creation and Reuse

URL: http://arxiv.org/abs/2207.01015v1
Date: Sun, 3 Jul 2022 11:49:52 GMT
Title: One-off Events? An Empirical Study of Hackathon Code Creation and Reuse
Authors: Ahmed Samir Imam Mahmoud, Tapajit Dey, Alexander Nolte, Audris Mockus, James D. Herbsleb
Abstract summary: We aim to understand the evolution of code used in and created during hackathon events. We collected information about 22,183 hackathon projects from DevPost.
Score: 69.98625403567553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the hackathon code. Aim: We aim to understand the evolution of code used in and created during hackathon events, with a particular focus on the code blobs, specifically, how frequently hackathon teams reuse pre-existing code, how much new code they develop, if that code gets reused afterward, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from DevPost and obtained related code blobs, authors, project characteristics, original author, code creation time, language, and size information from World of Code. We tracked the reuse of code blobs by identifying all commits containing blobs created during hackathons and identifying all projects that contain those commits. We also conducted a series of surveys in order to gain a deeper understanding of hackathon code evolution that we sent out to hackathon participants whose code was reused, whose code was not reused, and developers who reused some hackathon code. Result: 9.14% of the code blobs in hackathon repositories and 8% of the lines of code (LOC) are created during hackathons and around a third of the hackathon code gets reused in other projects by both blob count and LOC. The number of associated technologies and the number of participants in hackathons increase the reuse probability. Conclusion: The results of our study demonstrate hackathons are not always "one-off" events as common knowledge dictates and they can serve as a starting point for further studies in this area.

Related papers

Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification [19.570958294967536]
backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks. We propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification. We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models.
arXiv Detail & Related papers (2024-10-26T10:17:50Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models [12.214474083372389]
We propose Gotcha, a novel membership inference attack method specifically for code models. We show that Gotcha can predict the data membership with a high true positive rate of 0.95 and a low false positive rate of 0.10. This study calls for more attention to understanding the privacy of code models.
arXiv Detail & Related papers (2023-10-02T12:50:43Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic. COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)
Predicting Vulnerability In Large Codebases With Deep Code Representation [6.357681017646283]
Software engineers write code for various modules, quite often, various types of errors get introduced. Same or similar issues/bugs, which were fixed in the past (although in different modules), tend to get introduced in production code again. We developed a novel AI-based system which uses the deep representation of Abstract Syntax Tree (AST) created from the source code and also the active feedback loop.
arXiv Detail & Related papers (2020-04-24T13:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.