Towards the First Code Contribution: Processes and Information Needs
- URL: http://arxiv.org/abs/2404.18677v1
- Date: Mon, 29 Apr 2024 13:19:24 GMT
- Title: Towards the First Code Contribution: Processes and Information Needs
- Authors: Christoph Treude, Marco A. Gerosa, Igor Steinmacher,
- Abstract summary: We argue that much of the information needed by newcomers already exists, albeit scattered among many different sources.
Our findings form an essential step towards automated tool support that provides relevant information to project newcomers.
- Score: 18.542728636769255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Newcomers to a software project must overcome many barriers before they can successfully place their first code contribution, and they often struggle to find information that is relevant to them. In this work, we argue that much of the information needed by newcomers already exists, albeit scattered among many different sources, and that many barriers can be addressed by automatically identifying, extracting, generating, summarizing, and presenting documentation that is specifically aimed and customized for newcomers. To gain a detailed understanding of the processes followed by newcomers and their information needs before making their first code contribution, we conducted an empirical study. Based on a survey with about 100 practitioners, grounded theory analysis, and validation interviews, we contribute a 16-step model for the processes followed by newcomers to a software project and we identify relevant information, along with individual and project characteristics that influence the relevancy of information types and sources. Our findings form an essential step towards automated tool support that provides relevant information to project newcomers in each step of their contribution processes.
Related papers
- Assisted Data Annotation for Business Process Information Extraction from Textual Documents [15.770020803430246]
Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase.
We present two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model.
A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to $-51.0%$, and significantly improves annotation quality, up to $+38.9%$.
arXiv Detail & Related papers (2024-10-02T09:14:39Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Automating the Information Extraction from Semi-Structured Interview
Transcripts [0.0]
This paper explores the development and application of an automated system designed to extract information from semi-structured interview transcripts.
We present a user-friendly software prototype that enables researchers to efficiently process and visualize the thematic structure of interview data.
arXiv Detail & Related papers (2024-03-07T13:53:03Z) - User Modeling and User Profiling: A Comprehensive Survey [0.0]
This paper presents a survey of the current state, evolution, and future directions of user modeling and profiling research.
We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques.
We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches.
arXiv Detail & Related papers (2024-02-15T02:06:06Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - A Bayesian Framework for Information-Theoretic Probing [51.98576673620385]
We argue that probing should be seen as approximating a mutual information.
This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences.
This paper proposes a new framework to measure what we term Bayesian mutual information.
arXiv Detail & Related papers (2021-09-08T18:08:36Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Contribution of Conceptual Modeling to Enhancing Historians' Intuition
-Application to Prosopography [0.0]
We propose a process that automatically supports historians' intuition in prosopography.
The contribution is threefold: a conceptual data model, a process model, and a set of rules combining the reliability of sources and the credibility of information.
arXiv Detail & Related papers (2020-11-26T13:21:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.