One-off Events? An Empirical Study of Hackathon Code Creation and Reuse
- URL: http://arxiv.org/abs/2207.01015v1
- Date: Sun, 3 Jul 2022 11:49:52 GMT
- Title: One-off Events? An Empirical Study of Hackathon Code Creation and Reuse
- Authors: Ahmed Samir Imam Mahmoud, Tapajit Dey, Alexander Nolte, Audris Mockus,
James D. Herbsleb
- Abstract summary: We aim to understand the evolution of code used in and created during hackathon events.
We collected information about 22,183 hackathon projects from DevPost.
- Score: 69.98625403567553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Hackathons have become popular events for teams to collaborate on
projects and develop software prototypes. Most existing research focuses on
activities during an event with limited attention to the evolution of the
hackathon code. Aim: We aim to understand the evolution of code used in and
created during hackathon events, with a particular focus on the code blobs,
specifically, how frequently hackathon teams reuse pre-existing code, how much
new code they develop, if that code gets reused afterward, and what factors
affect reuse. Method: We collected information about 22,183 hackathon projects
from DevPost and obtained related code blobs, authors, project characteristics,
original author, code creation time, language, and size information from World
of Code. We tracked the reuse of code blobs by identifying all commits
containing blobs created during hackathons and identifying all projects that
contain those commits. We also conducted a series of surveys in order to gain a
deeper understanding of hackathon code evolution that we sent out to hackathon
participants whose code was reused, whose code was not reused, and developers
who reused some hackathon code. Result: 9.14% of the code blobs in hackathon
repositories and 8% of the lines of code (LOC) are created during hackathons
and around a third of the hackathon code gets reused in other projects by both
blob count and LOC. The number of associated technologies and the number of
participants in hackathons increase the reuse probability. Conclusion: The
results of our study demonstrate hackathons are not always "one-off" events as
common knowledge dictates and they can serve as a starting point for further
studies in this area.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - Importance Guided Data Augmentation for Neural-Based Code Understanding [29.69495788091569]
We introduce a general data augmentation framework, GenCode, to enhance the training of code understanding models.
Compared to the state-of-the-art (SOTA) code augmentation method, MixCode, GenCode produces code models with 2.92% higher accuracy and 4.90% robustness on average.
arXiv Detail & Related papers (2024-02-24T08:57:12Z) - Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models [12.214474083372389]
We propose Gotcha, a novel membership inference attack method specifically for code models.
We show that Gotcha can predict the data membership with a high true positive rate of 0.95 and a low false positive rate of 0.10.
This study calls for more attention to understanding the privacy of code models.
arXiv Detail & Related papers (2023-10-02T12:50:43Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z) - Predicting Vulnerability In Large Codebases With Deep Code
Representation [6.357681017646283]
Software engineers write code for various modules, quite often, various types of errors get introduced.
Same or similar issues/bugs, which were fixed in the past (although in different modules), tend to get introduced in production code again.
We developed a novel AI-based system which uses the deep representation of Abstract Syntax Tree (AST) created from the source code and also the active feedback loop.
arXiv Detail & Related papers (2020-04-24T13:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.