Dev2vec: Representing Domain Expertise of Developers in an Embedding
Space
- URL: http://arxiv.org/abs/2207.05132v1
- Date: Mon, 11 Jul 2022 18:56:49 GMT
- Title: Dev2vec: Representing Domain Expertise of Developers in an Embedding
Space
- Authors: Arghavan Moradi Dakhel, Michel C. Desmarais, Foutse Khomh
- Abstract summary: We employ doc2vec to represent the domain expertise of developers as embedding vectors.
These vectors are derived from different sources that contain evidence of developers' expertise.
Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods.
- Score: 10.321562340915406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate assessment of the domain expertise of developers is important for
assigning the proper candidate to contribute to a project or to attend a job
role. Since the potential candidate can come from a large pool, the automated
assessment of this domain expertise is a desirable goal. While previous methods
have had some success within a single software project, the assessment of a
developer's domain expertise from contributions across multiple projects is
more challenging. In this paper, we employ doc2vec to represent the domain
expertise of developers as embedding vectors. These vectors are derived from
different sources that contain evidence of developers' expertise, such as the
description of repositories that they contributed, their issue resolving
history, and API calls in their commits. We name it dev2vec and demonstrate its
effectiveness in representing the technical specialization of developers. Our
results indicate that encoding the expertise of developers in an embedding
vector outperforms state-of-the-art methods and improves the F1-score up to
21%. Moreover, our findings suggest that ``issue resolving history'' of
developers is the most informative source of information to represent the
domain expertise of developers in embedding spaces.
Related papers
- Knowledge Islands: Visualizing Developers Knowledge Concentration [0.0]
Knowledge Islands is a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model.
It enables practitioners to analyze GitHub projects, determine where knowledge is concentrated, and implement measures to maintain project health.
arXiv Detail & Related papers (2024-08-16T13:32:49Z) - R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models [51.468732121824125]
Large language models have achieved remarkable success on general NLP tasks, but they may fall short for domain-specific problems.
Existing evaluation tools only provide a few baselines and evaluate them on various domains without mining the depth of domain knowledge.
In this paper, we address the challenges of evaluating RALLMs by introducing the R-Eval toolkit, a Python toolkit designed to streamline the evaluation of different RAGs.
arXiv Detail & Related papers (2024-06-17T15:59:49Z) - Redefining Developer Assistance: Through Large Language Models in Software Ecosystem [0.5580128181112308]
We introduce DevAssistLlama, a model developed through instruction tuning, to assist developers in processing software-related natural language queries.
DevAssistLlama is particularly adept at handling intricate technical documentation, enhancing developer capability in software specific tasks.
arXiv Detail & Related papers (2023-12-09T18:02:37Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Who is the Real Hero? Measuring Developer Contribution via
Multi-dimensional Data Integration [8.735393610868435]
We propose CValue, a multidimensional information fusion-based approach to measure developer contributions.
CValue extracts both syntax and semantic information from the source code changes in four dimensions.
It fuses the information to produce the contribution score for each of the commits in the projects.
arXiv Detail & Related papers (2023-08-17T13:57:44Z) - Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Empowered and Embedded: Ethics and Agile Processes [60.63670249088117]
We argue that ethical considerations need to be embedded into the (agile) software development process.
We put emphasis on the possibility to implement ethical deliberations in already existing and well established agile software development processes.
arXiv Detail & Related papers (2021-07-15T11:14:03Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z) - Domain Adaptive Ensemble Learning [141.98192460069765]
We propose a unified framework termed domain adaptive ensemble learning (DAEL) to address both problems.
Experiments on three multi-source UDA and two DG datasets show that DAEL improves the state of the art on both problems, often by significant margins.
arXiv Detail & Related papers (2020-03-16T16:54:15Z) - Domain Adaption for Knowledge Tracing [65.86619804954283]
We propose a novel adaptable framework, namely knowledge tracing (AKT) to address the DAKT problem.
For the first aspect, we incorporate the educational characteristics (e.g., slip, guess, question texts) based on the deep knowledge tracing (DKT) to obtain a good performed knowledge tracing model.
For the second aspect, we propose and adopt three domain adaptation processes. First, we pre-train an auto-encoder to select useful source instances for target model training.
arXiv Detail & Related papers (2020-01-14T15:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.