Related papers: Knowledge Islands: Visualizing Developers Knowledge Concentration

Knowledge Islands: Visualizing Developers Knowledge Concentration

URL: http://arxiv.org/abs/2408.08733v1
Date: Fri, 16 Aug 2024 13:32:49 GMT
Title: Knowledge Islands: Visualizing Developers Knowledge Concentration
Authors: Otávio Cury, Guilherme Avelino,
Abstract summary: Knowledge Islands is a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. It enables practitioners to analyze GitHub projects, determine where knowledge is concentrated, and implement measures to maintain project health.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current software development is often a cooperative activity, where different situations can arise that put the existence of a project at risk. One common and extensively studied issue in the software engineering literature is the concentration of a significant portion of knowledge about the source code in a few developers on a team. In this scenario, the departure of one of these key developers could make it impossible to continue the project. This work presents Knowledge Islands, a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. Key features of Knowledge Islands include user authentication, cloning, and asynchronous analysis of user repositories, identification of the expertise of the team's developers, calculation of the Truck Factor for all folders and source code files, and identification of the main developers and repository files. This open-source tool enables practitioners to analyze GitHub projects, determine where knowledge is concentrated within the development team, and implement measures to maintain project health. The source code of Knowledge Islands is available in a public repository, and there is a presentation about the tool in video.

Related papers

Studying the Role of Reusing Crowdsourcing Knowledge in Software Development [1.4044759410670398]
Crowdsourcing platforms, such as Stack Overflow, have changed and impacted the software development practice.<n>In these platforms, developers share and reuse their software development and programming experience.<n>However, the empirical studies of software quality are lacking, and simple questions, such as what developers use the crowdsourcing knowledge for, are unanswered.
arXiv Detail & Related papers (2025-12-08T18:54:47Z)
Open Source Software Lifecycle Classification: Developing Wrangling Techniques for Complex Sociotechnical Systems [0.0]
This paper reviews previous attempts to classify open source software and other organizational ecosystems. It examines the divergent and sometimes conflicting purposes that may exist for classifying open source projects and how these competing interests impede our progress in developing a comprehensive understanding of how open source software projects and companies operate.
arXiv Detail & Related papers (2025-04-23T12:37:53Z)
Code Digital Twin: Empowering LLMs with Tacit Knowledge for Complex Software Maintenance [9.603528792596348]
We introduce the concept and framework of textbfCode Digital Twin, a conceptual representation of tacit knowledge. A code digital twin is constructed using a methodology that combines knowledge extraction from both structured and unstructured sources.
arXiv Detail & Related papers (2025-03-11T01:46:58Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation [51.31429493814664]
We present a benchmark named multi-source Wizard of Wikipedia for evaluating multi-source dialogue knowledge selection and response generation. We propose a new challenge, dialogue knowledge plug-and-play, which aims to test an already trained dialogue model on using new support knowledge from previously unseen sources.
arXiv Detail & Related papers (2024-03-06T06:54:02Z)
Private Knowledge Sharing in Distributed Learning: A Survey [50.51431815732716]
The rise of Artificial Intelligence has revolutionized numerous industries and transformed the way society operates. It is crucial to utilize information in learning processes that are either distributed or owned by different entities. Modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes.
arXiv Detail & Related papers (2024-02-08T07:18:23Z)
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit [63.82016263181941]
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora. Currently, there is already a thriving research community focusing on code intelligence.
arXiv Detail & Related papers (2023-12-30T17:48:37Z)
Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects. The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities. With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z)
The Software Heritage Open Science Ecosystem [0.0]
Software Heritage is the largest public archive of software source code and associated development history. It has archived more than 16 billion unique source code files coming from more than 250 million collaborative development projects. It supports empirical research on software by materializing in a single Merkle direct acyclic graph the development history of public code. It ensures availability and guarantees integrity of the source code of software artifacts used in any field that relies on software to conduct experiments.
arXiv Detail & Related papers (2023-10-16T11:32:03Z)
Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z)
Collaborative, Code-Proximal Dynamic Software Visualization within Code Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors. Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior. Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z)
LAGOON: An Analysis Tool for Open Source Communities [7.3861897382622015]
LAGOON is an open source platform for understanding the ecosystems of Open Source Software (OSS) communities. LAGOON ingests artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from websites. A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph.
arXiv Detail & Related papers (2022-01-26T18:52:11Z)
DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population [95.0099875111663]
DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements.
arXiv Detail & Related papers (2022-01-10T13:29:05Z)
LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit. Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers. We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers. We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.