Related papers: Semantically-enhanced Topic Recommendation System for Software Projects

Semantically-enhanced Topic Recommendation System for Software Projects

URL: http://arxiv.org/abs/2206.00085v1
Date: Tue, 31 May 2022 19:54:42 GMT
Title: Semantically-enhanced Topic Recommendation System for Software Projects
Authors: Maliheh Izadi, Mahtab Nejati, Abbas Heydarnoori
Abstract summary: Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. There have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics.
Score: 2.0625936401496237
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Software-related platforms have enabled their users to collaboratively label software entities with topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. The experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of ASR and MAP metrics.

Related papers

Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code [0.0]
Large language models (LLMs) have demonstrated remarkable program comprehension capabilities. transformer-based topic modeling techniques offer effective ways to extract semantic information from text. This paper proposes and explores a novel approach that combines these strengths to automatically identify meaningful topics in a corpus of Python programs.
arXiv Detail & Related papers (2025-04-24T10:30:40Z)
Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks. These resources lack a classification tailored to Software Engineering (SE) needs. We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research [0.3268055538225029]
We present an approach to assess the usability of such datasets for research on research software. One dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors. The greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation.
arXiv Detail & Related papers (2024-02-22T14:51:17Z)
MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities [3.2228025627337864]
Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. The handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. We present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise.
arXiv Detail & Related papers (2023-08-31T05:15:42Z)
KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics. A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Supporting the Task-driven Skill Identification in Open Source Project Issue Tracking Systems [0.0]
We investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute. By identifying the skills, we claim the contributor candidates should pick a task more suitable. We applied quantitative studies to analyze the relevance of the labels in an experiment and compare the strategies' relative importance.
arXiv Detail & Related papers (2022-11-02T14:17:22Z)
Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers. Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z)
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation [78.28390172958643]
We identify two key aspects that can help to alleviate multiple domain-shifts in the multi-target domain adaptation (MTDA) We propose Curriculum Graph Co-Teaching (CGCT) that uses a dual classifier head, with one of them being a graph convolutional network (GCN) which aggregates features from similar samples across the domains. When the domain labels are available, we propose Domain-aware Curriculum Learning (DCL), a sequential adaptation strategy that first adapts on the easier target domains, followed by the harder ones.
arXiv Detail & Related papers (2021-04-01T23:41:41Z)
LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit. Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers. We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes. BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.