Related papers: LAGOON: An Analysis Tool for Open Source Communities

LAGOON: An Analysis Tool for Open Source Communities

URL: http://arxiv.org/abs/2201.11657v1
Date: Wed, 26 Jan 2022 18:52:11 GMT
Title: LAGOON: An Analysis Tool for Open Source Communities
Authors: Sourya Dey, Walt Woods
Abstract summary: LAGOON is an open source platform for understanding the ecosystems of Open Source Software (OSS) communities. LAGOON ingests artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from websites. A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph.
Score: 7.3861897382622015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents LAGOON -- an open source platform for understanding the complex ecosystems of Open Source Software (OSS) communities. The platform currently utilizes spatiotemporal graphs to store and investigate the artifacts produced by these communities, and help analysts identify bad actors who might compromise an OSS project's security. LAGOON provides ingest of artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from project websites. Ingestion utilizes a modular architecture, which supports incremental updates from data sources and provides a generic identity fusion process that can recognize the same community members across disparate accounts. A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph. Scripts are provided for applying machine learning to identify patterns within the data. While current focus is on the identification of bad actors in the Python community, the platform's reusability makes it easily extensible with new data and analyses, paving the way for LAGOON to become a comprehensive means of assessing various OSS-based projects and their communities.

Related papers

Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z)
A Serverless Architecture for Real-Time Stock Analysis using Large Language Models: An Iterative Development and Debugging Case Study [0.0]
This paper documents the design, implementation, and iterative debug of a novel, serverless system for real-time stock analysis.<n>We detail the architectural evolution of the system, from initial concepts to a robust, event-driven pipeline.<n>The final architecture operates at a near-zero cost, demonstrating a viable model for individuals to build sophisticated AI-powered financial tools.
arXiv Detail & Related papers (2025-07-13T11:29:51Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents [55.37173845836839]
OS-Atlas is a foundational GUI action model that excels at GUI grounding and OOD agentic tasks. We are releasing the largest open-source cross-platform GUI grounding corpus to date, which contains over 13 million GUI elements.
arXiv Detail & Related papers (2024-10-30T17:10:19Z)
Knowledge Islands: Visualizing Developers Knowledge Concentration [0.0]
Knowledge Islands is a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. It enables practitioners to analyze GitHub projects, determine where knowledge is concentrated, and implement measures to maintain project health.
arXiv Detail & Related papers (2024-08-16T13:32:49Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Private Knowledge Sharing in Distributed Learning: A Survey [50.51431815732716]
The rise of Artificial Intelligence has revolutionized numerous industries and transformed the way society operates. It is crucial to utilize information in learning processes that are either distributed or owned by different entities. Modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes.
arXiv Detail & Related papers (2024-02-08T07:18:23Z)
PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps) It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z)
Enclosed Loops: How open source communities become datasets [2.4269101271105176]
Centralization in code hosting and package management in the 2010s created fundamental shifts in the social arrangements of open source ecosystems. In this paper we examine Dependabot, Crater and Copilot as three nascent tools whose existence is predicated on centralized software at scale.
arXiv Detail & Related papers (2023-06-09T00:02:25Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection. We provide an analysis of both classic and new applications in the field. The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z)
Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers. Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z)
Which contributions count? Analysis of attribution in open source [0.0]
We characterize contributor acknowledgment models in open source by analyzing thousands of projects. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible.
arXiv Detail & Related papers (2021-03-19T20:14:40Z)
Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers. We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.