Teaching Software Metrology: The Science of Measurement for Software Engineering
- URL: http://arxiv.org/abs/2406.14494v1
- Date: Thu, 20 Jun 2024 16:57:23 GMT
- Title: Teaching Software Metrology: The Science of Measurement for Software Engineering
- Authors: Paul Ralph, Miikka Kuutila, Hera Arif, Bimpe Ayoola,
- Abstract summary: This chapter reviews key concepts in the science of measurement and applies them to software engineering research.
A series of exercises for applying important measurement concepts to the reader's research are included.
- Score: 10.23712090082156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the methodological rigor of computing research has improved considerably in the past two decades, quantitative software engineering research is hampered by immature measures and inattention to theory. Measurement-the principled assignment of numbers to phenomena-is intrinsically difficult because observation is predicated upon not only theoretical concepts but also the values and perspective of the research. Despite several previous attempts to raise awareness of more sophisticated approaches to measurement and the importance of quantitatively assessing reliability and validity, measurement issues continue to be widely ignored. The reasons are unknown, but differences in typical engineering and computer science graduate training programs (compared to psychology and management, for example) are involved. This chapter therefore reviews key concepts in the science of measurement and applies them to software engineering research. A series of exercises for applying important measurement concepts to the reader's research are included, and a sample dataset for the reader to try some of the statistical procedures mentioned is provided.
Related papers
- DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making.
This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks.
We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z) - A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering [5.687882380471718]
Concerns about the correct application of empirical methodologies have existed since the 2006 Dagstuhl seminar on Empirical Software Engineering.
We conducted a literature survey of 27,000 empirical studies, using LLMs to classify statistical methodologies as adequate or inadequate.
We selected 30 primary studies and held a workshop with 33 ESE experts to assess their ability to identify and resolve statistical issues.
arXiv Detail & Related papers (2025-01-22T09:05:01Z) - Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education [12.716955305620191]
This study aims to contribute to the knowledge, about the synergy between Machine Learning (ML) and Software Engineering (SE)
We analyzed SE researchers familiar with ML or who authored SE articles using ML, along with the articles themselves.
We found diverse practices focusing on data collection, model training, and evaluation.
arXiv Detail & Related papers (2024-11-28T18:21:24Z) - Evaluating Generative AI Systems is a Social Science Measurement Challenge [78.35388859345056]
We present a framework for measuring concepts related to the capabilities, impacts, opportunities, and risks of GenAI systems.
The framework distinguishes between four levels: the background concept, the systematized concept, the measurement instrument(s), and the instance-level measurements themselves.
arXiv Detail & Related papers (2024-11-17T02:35:30Z) - Between Randomness and Arbitrariness: Some Lessons for Reliable Machine Learning at Scale [2.50194939587674]
dissertation: quantifying and mitigating sources of arbitiness in ML, randomness in uncertainty estimation and optimization algorithms, in order to achieve scalability without sacrificing reliability.
dissertation serves as an empirical proof by example that research on reliable measurement for machine learning is intimately bound up with research in law and policy.
arXiv Detail & Related papers (2024-06-13T19:29:37Z) - Apples, Oranges, and Software Engineering: Study Selection Challenges
for Secondary Research on Latent Variables [8.612556181934291]
The inability to measure abstract concepts directly poses a challenge for secondary studies in software engineering.
Standardized measurement instruments are rarely available, and even if they are, many researchers do not use them or do not even provide a definition for the studied concept.
SE researchers conducting secondary studies therefore have to decide a) which primary studies intended to measure the same construct, and b) how to compare and aggregate vastly different measurements for the same construct.
arXiv Detail & Related papers (2024-02-13T17:32:17Z) - Investigating Reproducibility in Deep Learning-Based Software Fault
Prediction [16.25827159504845]
With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature.
This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared.
We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
arXiv Detail & Related papers (2024-02-08T13:00:18Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z) - A Review of Uncertainty Quantification in Deep Learning: Techniques,
Applications and Challenges [76.20963684020145]
Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes.
Bizarre approximation and ensemble learning techniques are two most widely-used UQ methods in the literature.
This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning.
arXiv Detail & Related papers (2020-11-12T06:41:05Z) - Marginal likelihood computation for model selection and hypothesis
testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic.
We highlight limitations, benefits, connections and differences among the different techniques.
Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.