Supporting the Task-driven Skill Identification in Open Source Project
Issue Tracking Systems
- URL: http://arxiv.org/abs/2211.08143v1
- Date: Wed, 2 Nov 2022 14:17:22 GMT
- Title: Supporting the Task-driven Skill Identification in Open Source Project
Issue Tracking Systems
- Authors: Fabio Santos
- Abstract summary: We investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute.
By identifying the skills, we claim the contributor candidates should pick a task more suitable.
We applied quantitative studies to analyze the relevance of the labels in an experiment and compare the strategies' relative importance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selecting an appropriate task is challenging for contributors to Open Source
Software (OSS), mainly for those who are contributing for the first time.
Therefore, researchers and OSS projects have proposed various strategies to aid
newcomers, including labeling tasks. We investigate the automatic labeling of
open issues strategy to help the contributors to pick a task to contribute. We
label the issues with API-domains--categories of APIs parsed from the source
code used to solve the issues. We plan to add social network analysis metrics
from the issues conversations as new predictors. By identifying the skills, we
claim the contributor candidates should pick a task more suitable. We analyzed
interview transcripts and the survey's open-ended questions to comprehend the
strategies used to assist in onboarding contributors and used to pick up an
issue. We applied quantitative studies to analyze the relevance of the labels
in an experiment and compare the strategies' relative importance. We also mined
issue data from OSS repositories to predict the API-domain labels with
comparable precision, recall, and F-measure with the state-of-art. We plan to
use a skill ontology to assist the matching process between contributors and
tasks. By analyzing the confidence level of the matching instances in
ontologies describing contributors' skills and tasks, we might recommend issues
for contribution. So far, the results showed that organizing the issues--which
includes assigning labels is seen as an essential strategy for diverse roles in
OSS communities. The API-domain labels are relevant for experienced
practitioners. The predictions have an average precision of 75.5%. Labeling the
issues indicates the skills involved in an issue. The labels represent possible
skills in the source code related to an issue. By investigating this research
topic, we expect to assist the new contributors in finding a task.
Related papers
- A Unified Causal View of Instruction Tuning [76.1000380429553]
We develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data.
Key idea is to learn task-required causal factors and only use those to make predictions for a given task.
arXiv Detail & Related papers (2024-02-09T07:12:56Z) - Can GitHub Issues Help in App Review Classifications? [0.7366405857677226]
We propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from GitHub issues.
Our results demonstrate that using labeled issues for data augmentation can improve the F1-score to 6.3 in bug reports and 7.2 in feature requests.
arXiv Detail & Related papers (2023-08-27T22:01:24Z) - Tag that issue: Applying API-domain labels in issue tracking systems [20.701637107734996]
Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects.
We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs.
Our results show that newcomers consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another, and (iv) project
arXiv Detail & Related papers (2023-04-06T05:49:46Z) - Relational Multi-Task Learning: Modeling Relations between Data and
Tasks [84.41620970886483]
Key assumption in multi-task learning is that at the inference time the model only has access to a given data point but not to the data point's labels from other tasks.
Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions.
We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks.
arXiv Detail & Related papers (2023-03-14T07:15:41Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Identify ambiguous tasks combining crowdsourced labels by weighting
Areas Under the Margin [13.437403258942716]
Ambiguous tasks might fool expert workers, which is often harmful for the learning step.
We adapt the Area Under the Margin (AUM) to identify mislabeled data in crowdsourced learning scenarios.
We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization performance.
arXiv Detail & Related papers (2022-09-30T11:16:20Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Can I Solve It? Identifying APIs Required to Complete OSS Task [16.13269535068818]
We investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks.
We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%.
Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
arXiv Detail & Related papers (2021-03-23T16:16:09Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.