Related papers: Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights

Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights

URL: http://arxiv.org/abs/2406.08228v1
Date: Wed, 12 Jun 2024 13:56:55 GMT
Title: Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights
Authors: Christoph Treude,
Abstract summary: Software repositories are rich sources of qualitative artifacts, including source code comments, commit messages, issue descriptions, and documentation. This chapter shifts the focus towards interpreting these artifacts using various qualitative data analysis techniques. Various coding methods are discussed along with the strategic design of a coding guide to ensure consistency and accuracy in data interpretation.
Score: 10.222207222039048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software repositories are rich sources of qualitative artifacts, including source code comments, commit messages, issue descriptions, and documentation. These artifacts offer many interesting insights when analyzed through quantitative methods, as outlined in the chapter on mining software repositories. This chapter shifts the focus towards interpreting these artifacts using various qualitative data analysis techniques. We introduce qualitative coding as an iterative process, which is crucial not only for educational purposes but also to enhance the credibility and depth of research findings. Various coding methods are discussed along with the strategic design of a coding guide to ensure consistency and accuracy in data interpretation. The chapter also discusses quality assurance in qualitative data analysis, emphasizing principles such as credibility, transferability, dependability, and confirmability. These principles are vital to ensure that the findings are robust and can be generalized in different contexts. By sharing best practices and lessons learned, we aim to equip all readers with the tools necessary to conduct rigorous qualitative research in the field of software engineering.

Related papers

Metamorphic Testing of Deep Code Models: A Systematic Literature Review [9.09091334696889]
Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field.<n>These models can process source code and software artifacts with high accuracy in tasks such as code completion, defect detection, and code summarization.<n> robustness remains a critical quality attribute for deep-code models as they may produce different results under varied and adversarial conditions.
arXiv Detail & Related papers (2025-07-30T12:25:30Z)
Data interference: emojis, homoglyphs, and issues of data fidelity in corpora and their results [0.0]
This paper examines how discrepancies in tokenisation affect the representation of language data and the validity of analytical findings.<n>The research presents methods for ensuring that digital texts are accurately represented in corpora.
arXiv Detail & Related papers (2025-07-02T14:46:26Z)
Towards an Understanding of Context Utilization in Code Intelligence [37.85380387094615]
Code intelligence aims to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs can substantially enhance model performance. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence.
arXiv Detail & Related papers (2025-04-11T17:59:53Z)
Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation. We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z)
Natural Language Processing for Requirements Traceability [47.93107382627423]
Traceability plays a crucial role in requirements and software engineering, particularly for safety-critical systems. Natural language processing (NLP) and related techniques have made considerable progress in the past decade.
arXiv Detail & Related papers (2024-05-17T15:17:00Z)
Automating the Information Extraction from Semi-Structured Interview Transcripts [0.0]
This paper explores the development and application of an automated system designed to extract information from semi-structured interview transcripts. We present a user-friendly software prototype that enables researchers to efficiently process and visualize the thematic structure of interview data.
arXiv Detail & Related papers (2024-03-07T13:53:03Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Investigating the Impact of Vocabulary Difficulty and Code Naturalness on Program Comprehension [3.35803394416914]
This study aims to assess readability and understandability from the perspective of language acquisition. We will conduct a statistical analysis to understand their correlations and analyze whether code naturalness and vocabulary difficulty can be used to improve the performance of readability and understandability prediction methods.
arXiv Detail & Related papers (2023-08-25T15:15:00Z)
Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z)
Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z)
KGEA: A Knowledge Graph Enhanced Article Quality Identification Dataset [4.811084336809668]
We propose a knowledge graph enhanced article quality identification dataset (KGEA) based on Baidu Encyclopedia. We quantified the articles through 7 dimensions and use co-occurrence of the entities between the articles and the Baidu encyclopedia to construct the knowledge graph for every article. We also compared some text classification baselines and found that external knowledge can guide the articles to a more competitive classification with the graph neural networks.
arXiv Detail & Related papers (2022-06-15T14:15:41Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
A Survey on Machine Learning Techniques for Source Code Analysis [14.129976741300029]
We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021.
arXiv Detail & Related papers (2021-10-18T20:13:38Z)
CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis [33.190021245507445]
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process. We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments. We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplieds and outperforms a suite of baselines.
arXiv Detail & Related papers (2020-08-28T19:57:49Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.