Related papers: Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development

Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development

URL: http://arxiv.org/abs/2409.04830v1
Date: Sat, 7 Sep 2024 13:50:40 GMT
Title: Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development
Authors: Mahmoud Jahanshahi, David Reid, Audris Mockus,
Abstract summary: In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse.
Score: 5.412781090113212
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself. In contrast to dependency-based reuse, the infrastructure to systematically support copy-based reuse appears to be entirely missing. Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse. We seek a better understanding of such reuse by measuring its prevalence and identifying factors affecting the propensity to reuse. To identify reused artifacts and trace their origins, our method exploits World of Code infrastructure. We begin with a set of theory-derived factors related to the propensity to reuse, sample instances of different reuse types, and survey developers to better understand their intentions. Our results indicate that copy-based reuse is common, with many developers being aware of it when writing code. The propensity for a file to be reused varies greatly among languages and between source code and binary files, consistently decreasing over time. Files introduced by popular projects are more likely to be reused, but at least half of reused resources originate from ``small'' and ``medium'' projects. Developers had various reasons for reuse but were generally positive about using a package manager.

Related papers

In Search of Metrics to Guide Developer-Based Refactoring Recommendations [13.063733696956678]
Motivation is a well-established approach to improving source code quality without compromising its external behavior. We propose an empirical study into the metrics that study the developer's willingness to apply operations. We will quantify the value of product and process metrics in grasping developers' motivations to perform.
arXiv Detail & Related papers (2024-07-25T16:32:35Z)
Investigating the Transferability of Code Repair for Low-Resource Programming Languages [57.62712191540067]
Large language models (LLMs) have shown remarkable performance on code generation tasks. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation. We investigate the benefits of distilling code repair for both high and low resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z)
A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation [51.31429493814664]
We present a benchmark named multi-source Wizard of Wikipedia for evaluating multi-source dialogue knowledge selection and response generation. We propose a new challenge, dialogue knowledge plug-and-play, which aims to test an already trained dialogue model on using new support knowledge from previously unseen sources.
arXiv Detail & Related papers (2024-03-06T06:54:02Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
Dataset: Copy-based Reuse in Open Source Software [5.917654223291073]
In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS.
arXiv Detail & Related papers (2023-12-14T22:08:09Z)
How is Software Reuse Discussed in Stack Overflow? [12.586676749644342]
We present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our findings show that 'visual studio' is the top occurring bigrams for question posts, and there are frequent design patterns utilized by developers for the purpose of reuse.
arXiv Detail & Related papers (2023-11-01T03:13:36Z)
Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck [6.230859543111394]
Assembly clone search has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search.
arXiv Detail & Related papers (2023-07-20T06:55:37Z)
Predicting the Impact of Batch Refactoring Code Smells on Application Resource Consumption [3.5557219875516646]
This paper determines the relationship between software code smell batch, and resource consumption. Next, it aims to design algorithms to predict the impact of code smell on resource consumption.
arXiv Detail & Related papers (2023-06-27T19:28:05Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.