LAGOON: An Analysis Tool for Open Source Communities
- URL: http://arxiv.org/abs/2201.11657v1
- Date: Wed, 26 Jan 2022 18:52:11 GMT
- Title: LAGOON: An Analysis Tool for Open Source Communities
- Authors: Sourya Dey, Walt Woods
- Abstract summary: LAGOON is an open source platform for understanding the ecosystems of Open Source Software (OSS) communities.
LAGOON ingests artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from websites.
A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph.
- Score: 7.3861897382622015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents LAGOON -- an open source platform for understanding the
complex ecosystems of Open Source Software (OSS) communities. The platform
currently utilizes spatiotemporal graphs to store and investigate the artifacts
produced by these communities, and help analysts identify bad actors who might
compromise an OSS project's security. LAGOON provides ingest of artifacts from
several common sources, including source code repositories, issue trackers,
mailing lists and scraping content from project websites. Ingestion utilizes a
modular architecture, which supports incremental updates from data sources and
provides a generic identity fusion process that can recognize the same
community members across disparate accounts. A user interface is provided for
visualization and exploration of an OSS project's complete sociotechnical
graph. Scripts are provided for applying machine learning to identify patterns
within the data. While current focus is on the identification of bad actors in
the Python community, the platform's reusability makes it easily extensible
with new data and analyses, paving the way for LAGOON to become a comprehensive
means of assessing various OSS-based projects and their communities.
Related papers
- DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Private Knowledge Sharing in Distributed Learning: A Survey [50.51431815732716]
The rise of Artificial Intelligence has revolutionized numerous industries and transformed the way society operates.
It is crucial to utilize information in learning processes that are either distributed or owned by different entities.
Modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes.
arXiv Detail & Related papers (2024-02-08T07:18:23Z) - Towards a Structural Equation Model of Open Source Blockchain Software
Health [0.0]
This work uses exploratory factor analysis to identify latent constructs that are representative of general public interest or popularity in software.
We find that interest is a combination of stars, forks, and text mentions in the GitHub repository, while a second factor for robustness is composed of a criticality score.
A structural model of software health is proposed such that general interest positively influences developer engagement, which, in turn, positively predicts software robustness.
arXiv Detail & Related papers (2023-10-31T08:47:41Z) - PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps)
It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z) - Enclosed Loops: How open source communities become datasets [2.4269101271105176]
Centralization in code hosting and package management in the 2010s created fundamental shifts in the social arrangements of open source ecosystems.
In this paper we examine Dependabot, Crater and Copilot as three nascent tools whose existence is predicated on centralized software at scale.
arXiv Detail & Related papers (2023-06-09T00:02:25Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z) - Which contributions count? Analysis of attribution in open source [0.0]
We characterize contributor acknowledgment models in open source by analyzing thousands of projects.
We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible.
arXiv Detail & Related papers (2021-03-19T20:14:40Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.