Semantically-enhanced Topic Recommendation System for Software Projects
- URL: http://arxiv.org/abs/2206.00085v1
- Date: Tue, 31 May 2022 19:54:42 GMT
- Title: Semantically-enhanced Topic Recommendation System for Software Projects
- Authors: Maliheh Izadi, Mahtab Nejati, Abbas Heydarnoori
- Abstract summary: Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks.
There have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far.
We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics.
- Score: 2.0625936401496237
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Software-related platforms have enabled their users to collaboratively label
software entities with topics. Tagging software repositories with relevant
topics can be exploited for facilitating various downstream tasks. For
instance, a correct and complete set of topics assigned to a repository can
increase its visibility. Consequently, this improves the outcome of tasks such
as browsing, searching, navigation, and organization of repositories.
Unfortunately, assigned topics are usually highly noisy, and some repositories
do not have well-assigned topics. Thus, there have been efforts on recommending
topics for software projects, however, the semantic relationships among these
topics have not been exploited so far. We propose two recommender models for
tagging software projects that incorporate the semantic relationship among
topics. Our approach has two main phases; (1) we first take a collaborative
approach to curate a dataset of quality topics specifically for the domain of
software engineering and development. We also enrich this data with the
semantic relationships among these topics and encapsulate them in a knowledge
graph we call SED-KGraph. Then, (2) we build two recommender systems; The first
one operates only based on the list of original topics assigned to a repository
and the relationships specified in our knowledge graph. The second predictive
model, however, assumes there are no topics available for a repository, hence
it proceeds to predict the relevant topics based on both textual information of
a software project and SED-KGraph. We built SED-KGraph in a crowd-sourced
project with 170 contributors from both academia and industry. The experiment
results indicate that our solutions outperform baselines that neglect the
semantic relationships among topics by at least 25% and 23% in terms of ASR and
MAP metrics.
Related papers
- Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - Don't mention it: An approach to assess challenges to using software
mentions for citation and discoverability research [0.3268055538225029]
We present an approach to assess the usability of such datasets for research on research software.
One dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors.
The greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation.
arXiv Detail & Related papers (2024-02-22T14:51:17Z) - MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance
Activities [3.2228025627337864]
Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests.
The handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task.
We present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise.
arXiv Detail & Related papers (2023-08-31T05:15:42Z) - KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot
Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis.
We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics.
A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Supporting the Task-driven Skill Identification in Open Source Project
Issue Tracking Systems [0.0]
We investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute.
By identifying the skills, we claim the contributor candidates should pick a task more suitable.
We applied quantitative studies to analyze the relevance of the labels in an experiment and compare the strategies' relative importance.
arXiv Detail & Related papers (2022-11-02T14:17:22Z) - Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z) - Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation [78.28390172958643]
We identify two key aspects that can help to alleviate multiple domain-shifts in the multi-target domain adaptation (MTDA)
We propose Curriculum Graph Co-Teaching (CGCT) that uses a dual classifier head, with one of them being a graph convolutional network (GCN) which aggregates features from similar samples across the domains.
When the domain labels are available, we propose Domain-aware Curriculum Learning (DCL), a sequential adaptation strategy that first adapts on the easier target domains, followed by the harder ones.
arXiv Detail & Related papers (2021-04-01T23:41:41Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.