Teaching Software Metrology: The Science of Measurement for Software Engineering
- URL: http://arxiv.org/abs/2406.14494v1
- Date: Thu, 20 Jun 2024 16:57:23 GMT
- Title: Teaching Software Metrology: The Science of Measurement for Software Engineering
- Authors: Paul Ralph, Miikka Kuutila, Hera Arif, Bimpe Ayoola,
- Abstract summary: This chapter reviews key concepts in the science of measurement and applies them to software engineering research.
A series of exercises for applying important measurement concepts to the reader's research are included.
- Score: 10.23712090082156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the methodological rigor of computing research has improved considerably in the past two decades, quantitative software engineering research is hampered by immature measures and inattention to theory. Measurement-the principled assignment of numbers to phenomena-is intrinsically difficult because observation is predicated upon not only theoretical concepts but also the values and perspective of the research. Despite several previous attempts to raise awareness of more sophisticated approaches to measurement and the importance of quantitatively assessing reliability and validity, measurement issues continue to be widely ignored. The reasons are unknown, but differences in typical engineering and computer science graduate training programs (compared to psychology and management, for example) are involved. This chapter therefore reviews key concepts in the science of measurement and applies them to software engineering research. A series of exercises for applying important measurement concepts to the reader's research are included, and a sample dataset for the reader to try some of the statistical procedures mentioned is provided.
Related papers
- Evaluating Generative AI Systems is a Social Science Measurement Challenge [78.35388859345056]
We present a framework for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems.
The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves.
arXiv Detail & Related papers (2024-11-17T02:35:30Z) - Between Randomness and Arbitrariness: Some Lessons for Reliable Machine Learning at Scale [2.50194939587674]
dissertation: quantifying and mitigating sources of arbitiness in ML, randomness in uncertainty estimation and optimization algorithms, in order to achieve scalability without sacrificing reliability.
dissertation serves as an empirical proof by example that research on reliable measurement for machine learning is intimately bound up with research in law and policy.
arXiv Detail & Related papers (2024-06-13T19:29:37Z) - Understanding and measuring software engineer behavior: What can we learn from the behavioral sciences? [3.2789487559198967]
We advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources.
This paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences.
arXiv Detail & Related papers (2024-06-05T14:59:40Z) - Apples, Oranges, and Software Engineering: Study Selection Challenges
for Secondary Research on Latent Variables [8.612556181934291]
The inability to measure abstract concepts directly poses a challenge for secondary studies in software engineering.
Standardized measurement instruments are rarely available, and even if they are, many researchers do not use them or do not even provide a definition for the studied concept.
SE researchers conducting secondary studies therefore have to decide a) which primary studies intended to measure the same construct, and b) how to compare and aggregate vastly different measurements for the same construct.
arXiv Detail & Related papers (2024-02-13T17:32:17Z) - Investigating Reproducibility in Deep Learning-Based Software Fault
Prediction [16.25827159504845]
With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature.
This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared.
We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
arXiv Detail & Related papers (2024-02-08T13:00:18Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - A Review of Uncertainty Quantification in Deep Learning: Techniques,
Applications and Challenges [76.20963684020145]
Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes.
Bizarre approximation and ensemble learning techniques are two most widely-used UQ methods in the literature.
This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning.
arXiv Detail & Related papers (2020-11-12T06:41:05Z) - Targeting Learning: Robust Statistics for Reproducible Research [1.1455937444848387]
Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence.
The roadmap of Targeted Learning emphasizes tailoring statistical procedures so as to minimize their assumptions, carefully grounding them only in the scientific knowledge available.
arXiv Detail & Related papers (2020-06-12T17:17:01Z) - Marginal likelihood computation for model selection and hypothesis
testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic.
We highlight limitations, benefits, connections and differences among the different techniques.
Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.