Related papers: Supporting the Task-driven Skill Identification in Open Source Project Issue Tracking Systems

Supporting the Task-driven Skill Identification in Open Source Project Issue Tracking Systems

URL: http://arxiv.org/abs/2211.08143v1
Date: Wed, 2 Nov 2022 14:17:22 GMT
Title: Supporting the Task-driven Skill Identification in Open Source Project Issue Tracking Systems
Authors: Fabio Santos
Abstract summary: We investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute. By identifying the skills, we claim the contributor candidates should pick a task more suitable. We applied quantitative studies to analyze the relevance of the labels in an experiment and compare the strategies' relative importance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Selecting an appropriate task is challenging for contributors to Open Source Software (OSS), mainly for those who are contributing for the first time. Therefore, researchers and OSS projects have proposed various strategies to aid newcomers, including labeling tasks. We investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute. We label the issues with API-domains--categories of APIs parsed from the source code used to solve the issues. We plan to add social network analysis metrics from the issues conversations as new predictors. By identifying the skills, we claim the contributor candidates should pick a task more suitable. We analyzed interview transcripts and the survey's open-ended questions to comprehend the strategies used to assist in onboarding contributors and used to pick up an issue. We applied quantitative studies to analyze the relevance of the labels in an experiment and compare the strategies' relative importance. We also mined issue data from OSS repositories to predict the API-domain labels with comparable precision, recall, and F-measure with the state-of-art. We plan to use a skill ontology to assist the matching process between contributors and tasks. By analyzing the confidence level of the matching instances in ontologies describing contributors' skills and tasks, we might recommend issues for contribution. So far, the results showed that organizing the issues--which includes assigning labels is seen as an essential strategy for diverse roles in OSS communities. The API-domain labels are relevant for experienced practitioners. The predictions have an average precision of 75.5%. Labeling the issues indicates the skills involved in an issue. The labels represent possible skills in the source code related to an issue. By investigating this research topic, we expect to assist the new contributors in finding a task.

Related papers

IssueCourier: Multi-Relational Heterogeneous Temporal Graph Neural Network for Open-Source Issue Assignment [5.1987901165589]
Issue assignment plays a critical role in open-source software (OSS) maintenance.<n>We propose IssueCourier, a novel Multi-Relational Heterogeneous Temporal Graph Neural Network approach for issue assignment.<n>We show that IssueCourier can improve over the best baseline up to 45.49% in top-1 and 31.97% in MRR.
arXiv Detail & Related papers (2025-05-16T13:03:26Z)
Are We on the Same Page? Examining Developer Perception Alignment in Open Source Code Reviews [2.66269503676104]
Code reviews are a critical aspect of open-source software (OSS) development, ensuring quality and fostering collaboration. This study examines perceptions, challenges, and biases in OSS code review processes, focusing on the perspectives of Contributors and maintainers.
arXiv Detail & Related papers (2025-04-25T15:03:39Z)
SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub [8.890715113245877]
We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average.
arXiv Detail & Related papers (2025-01-27T10:17:38Z)
A Unified Causal View of Instruction Tuning [76.1000380429553]
We develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data. Key idea is to learn task-required causal factors and only use those to make predictions for a given task.
arXiv Detail & Related papers (2024-02-09T07:12:56Z)
Can GitHub Issues Help in App Review Classifications? [0.7366405857677226]
We propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from GitHub issues. Our results demonstrate that using labeled issues for data augmentation can improve the F1-score to 6.3 in bug reports and 7.2 in feature requests.
arXiv Detail & Related papers (2023-08-27T22:01:24Z)
Tag that issue: Applying API-domain labels in issue tracking systems [20.701637107734996]
Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Our results show that newcomers consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another, and (iv) project
arXiv Detail & Related papers (2023-04-06T05:49:46Z)
Relational Multi-Task Learning: Modeling Relations between Data and Tasks [84.41620970886483]
Key assumption in multi-task learning is that at the inference time the model only has access to a given data point but not to the data point's labels from other tasks. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks.
arXiv Detail & Related papers (2023-03-14T07:15:41Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin [13.437403258942716]
Ambiguous tasks might fool expert workers, which is often harmful for the learning step. We adapt the Area Under the Margin (AUM) to identify mislabeled data in crowdsourced learning scenarios. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization performance.
arXiv Detail & Related papers (2022-09-30T11:16:20Z)
Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z)
Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
Can I Solve It? Identifying APIs Required to Complete OSS Task [16.13269535068818]
We investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
arXiv Detail & Related papers (2021-03-23T16:16:09Z)
Attention-based model for predicting question relatedness on Stack Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically. ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics. Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z)
Mining Implicit Relevance Feedback from User Behavior for Web Question Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance. Our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.