LAGOON: An Analysis Tool for Open Source Communities
- URL: http://arxiv.org/abs/2201.11657v1
- Date: Wed, 26 Jan 2022 18:52:11 GMT
- Title: LAGOON: An Analysis Tool for Open Source Communities
- Authors: Sourya Dey, Walt Woods
- Abstract summary: LAGOON is an open source platform for understanding the ecosystems of Open Source Software (OSS) communities.
LAGOON ingests artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from websites.
A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph.
- Score: 7.3861897382622015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents LAGOON -- an open source platform for understanding the
complex ecosystems of Open Source Software (OSS) communities. The platform
currently utilizes spatiotemporal graphs to store and investigate the artifacts
produced by these communities, and help analysts identify bad actors who might
compromise an OSS project's security. LAGOON provides ingest of artifacts from
several common sources, including source code repositories, issue trackers,
mailing lists and scraping content from project websites. Ingestion utilizes a
modular architecture, which supports incremental updates from data sources and
provides a generic identity fusion process that can recognize the same
community members across disparate accounts. A user interface is provided for
visualization and exploration of an OSS project's complete sociotechnical
graph. Scripts are provided for applying machine learning to identify patterns
within the data. While current focus is on the identification of bad actors in
the Python community, the platform's reusability makes it easily extensible
with new data and analyses, paving the way for LAGOON to become a comprehensive
means of assessing various OSS-based projects and their communities.
Related papers
- OS-ATLAS: A Foundation Action Model for Generalist GUI Agents [55.37173845836839]
OS-Atlas is a foundational GUI action model that excels at GUI grounding and OOD agentic tasks.
We are releasing the largest open-source cross-platform GUI grounding corpus to date, which contains over 13 million GUI elements.
arXiv Detail & Related papers (2024-10-30T17:10:19Z) - Knowledge Islands: Visualizing Developers Knowledge Concentration [0.0]
Knowledge Islands is a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model.
It enables practitioners to analyze GitHub projects, determine where knowledge is concentrated, and implement measures to maintain project health.
arXiv Detail & Related papers (2024-08-16T13:32:49Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Private Knowledge Sharing in Distributed Learning: A Survey [50.51431815732716]
The rise of Artificial Intelligence has revolutionized numerous industries and transformed the way society operates.
It is crucial to utilize information in learning processes that are either distributed or owned by different entities.
Modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes.
arXiv Detail & Related papers (2024-02-08T07:18:23Z) - PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps)
It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z) - Enclosed Loops: How open source communities become datasets [2.4269101271105176]
Centralization in code hosting and package management in the 2010s created fundamental shifts in the social arrangements of open source ecosystems.
In this paper we examine Dependabot, Crater and Copilot as three nascent tools whose existence is predicated on centralized software at scale.
arXiv Detail & Related papers (2023-06-09T00:02:25Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z) - Which contributions count? Analysis of attribution in open source [0.0]
We characterize contributor acknowledgment models in open source by analyzing thousands of projects.
We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible.
arXiv Detail & Related papers (2021-03-19T20:14:40Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.