Who's actually being Studied? A Call for Population Analysis in Software Engineering Research
- URL: http://arxiv.org/abs/2404.15093v1
- Date: Tue, 23 Apr 2024 14:48:11 GMT
- Title: Who's actually being Studied? A Call for Population Analysis in Software Engineering Research
- Authors: Jefferson Seide Molléri,
- Abstract summary: Population analysis is crucial for ensuring that empirical software engineering (ESE) research is representative and its findings are valid.
We explore the challenges ranging from analysing populations of individual software engineers to organizations and projects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Population analysis is crucial for ensuring that empirical software engineering (ESE) research is representative and its findings are valid. Yet, there is a persistent gap between sampling processes and the holistic examination of populations, which this position paper addresses. We explore the challenges ranging from analysing populations of individual software engineers to organizations and projects. We discuss the interplay between generalizability and transferability and advocate for appropriate population frames. We also present a compelling case for improved population analysis aiming to enhance the empirical rigor and external validity of ESE research.
Related papers
- Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.
As a testbed, we use country-level results from two global cultural surveys.
We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - A Call for Critically Rethinking and Reforming Data Analysis in Empirical Software Engineering [5.687882380471718]
Concerns about the correct application of empirical methodologies have existed since the 2006 Dagstuhl seminar on Empirical Software Engineering.
We conducted a literature survey of 27,000 empirical studies, using LLMs to classify statistical methodologies as adequate or inadequate.
We selected 30 primary studies and held a workshop with 33 ESE experts to assess their ability to identify and resolve statistical issues.
arXiv Detail & Related papers (2025-01-22T09:05:01Z) - Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering [0.46426852157920906]
The study emphasizes the need for structured strategies and guidelines to optimize LLM use in qualitative research within software engineering.
While LLMs show promise in supporting qualitative analysis, human expertise remains crucial for interpreting data, and ongoing exploration of best practices will be vital for their successful integration into empirical software engineering research.
arXiv Detail & Related papers (2024-12-09T15:17:36Z) - Adaptive Recruitment Resource Allocation to Improve Cohort Representativeness in Participatory Biomedical Datasets [23.462552062769426]
We introduce a computational approach to adaptively allocate recruitment resources among sites to improve representativeness.
In simulated recruitment of 10,000-participant cohorts from medical centers, we show that our approach yields a more representative cohort than existing baselines.
arXiv Detail & Related papers (2024-08-02T16:32:30Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge [63.311045291016555]
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts.
This paper summarizes the challenging task, data, and research progress.
arXiv Detail & Related papers (2024-05-17T02:36:14Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - The ethical ambiguity of AI data enrichment: Measuring gaps in research
ethics norms and practices [2.28438857884398]
This study explores how, and to what extent, comparable research ethics requirements and norms have developed for AI research and data enrichment.
Leading AI venues have begun to establish protocols for human data collection, but these are are inconsistently followed by authors.
arXiv Detail & Related papers (2023-06-01T16:12:55Z) - How WEIRD is Usable Privacy and Security Research? (Extended Version) [7.669758543344074]
We conducted a literature review to understand the extent to which participant samples in UPS papers were from WEIRD countries.
Geographic and linguistic barriers in the study methods and recruitment methods may cause researchers to conduct user studies locally.
arXiv Detail & Related papers (2023-05-08T19:21:18Z) - Towards Automated Process Planning and Mining [77.34726150561087]
We present a research project in which researchers from the AI and BPM field work jointly together.
We discuss the overall research problem, the relevant fields of research and our overall research framework to automatically derive process models.
arXiv Detail & Related papers (2022-08-18T16:41:22Z) - Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper [50.25428141435537]
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between machine learning, big data, streaming analytics, and the management of IT operations.
Main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field.
arXiv Detail & Related papers (2021-01-15T10:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.