Q-Sat AI: Machine Learning-Based Decision Support for Data Saturation in Qualitative Studies
- URL: http://arxiv.org/abs/2511.01935v1
- Date: Sun, 02 Nov 2025 17:18:51 GMT
- Title: Q-Sat AI: Machine Learning-Based Decision Support for Data Saturation in Qualitative Studies
- Authors: Hasan Tutar, Caner Erden, Ümit Şentürk,
- Abstract summary: The determination of sample size in qualitative research has traditionally relied on the subjective and often ambiguous principle of data saturation.<n>This study introduces a new, systematic model based on machine learning (ML) to make this process more objective.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The determination of sample size in qualitative research has traditionally relied on the subjective and often ambiguous principle of data saturation, which can lead to inconsistencies and threaten methodological rigor. This study introduces a new, systematic model based on machine learning (ML) to make this process more objective. Utilizing a dataset derived from five fundamental qualitative research approaches - namely, Case Study, Grounded Theory, Phenomenology, Narrative Research, and Ethnographic Research - we developed an ensemble learning model. Ten critical parameters, including research scope, information power, and researcher competence, were evaluated using an ordinal scale and used as input features. After thorough preprocessing and outlier removal, multiple ML algorithms were trained and compared. The K-Nearest Neighbors (KNN), Gradient Boosting (GB), Random Forest (RF), XGBoost, and Decision Tree (DT) algorithms showed the highest explanatory power (Test R2 ~ 0.85), effectively modeling the complex, non-linear relationships involved in qualitative sampling decisions. Feature importance analysis confirmed the vital roles of research design type and information power, providing quantitative validation of key theoretical assumptions in qualitative methodology. The study concludes by proposing a conceptual framework for a web-based computational application designed to serve as a decision support system for qualitative researchers, journal reviewers, and thesis advisors. This model represents a significant step toward standardizing sample size justification, enhancing transparency, and strengthening the epistemological foundation of qualitative inquiry through evidence-based, systematic decision-making.
Related papers
- RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension [65.81339691942757]
RPC-Bench is a large-scale question-answering benchmark built from review-rebuttal exchanges of high-quality computer science papers.<n>We design a fine-grained taxonomy aligned with the scientific research flow to assess models' ability to understand and answer why, what, and how questions in scholarly contexts.
arXiv Detail & Related papers (2026-01-14T11:37:00Z) - Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation [192.53529928861818]
Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI)<n>However, the costs associated with data annotation and model training remain significant.<n>This survey employs active sampling theory to analyze the generalization error and label complexity associated with learning from low-resource data.
arXiv Detail & Related papers (2025-10-10T03:15:42Z) - Consistency of Feature Attribution in Deep Learning Architectures for Multi-Omics [0.36646002427839136]
We investigate the use of Shapley Additive Explanations (SHAP) on a multi-view deep learning model applied to multi-omics data.<n> Rankings of features via SHAP are compared across various architectures to evaluate consistency of the method.<n>We present an alternative, simple method to assess the robustness of identification of important biomolecules.
arXiv Detail & Related papers (2025-07-30T17:53:42Z) - A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data [8.695136686770772]
We argue that confidence in the credibility and robustness of results depends on adopting a 'human-in-the-loop' methodology.<n>We propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets.
arXiv Detail & Related papers (2025-06-06T13:43:12Z) - Interpretable Credit Default Prediction with Ensemble Learning and SHAP [3.948008559977866]
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms.<n>The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems.<n>The external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value.
arXiv Detail & Related papers (2025-05-27T07:23:22Z) - PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z) - Aggregating empirical evidence from data strategy studies: a case on model quantization [5.467675229660525]
This study assesses the effects of model quantization on correctness and resource efficiency in deep learning (DL) systems.<n>We applied the Structured Synthesis Method (SSM) to aggregate the findings.
arXiv Detail & Related papers (2025-05-01T19:18:35Z) - MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning? [51.85759493254735]
MindGYM is a structured and scalable framework for question synthesis.<n>It infuses high-level reasoning objectives to shape the model's synthesis behavior.<n>It composes more complex multi-hop questions based on QA seeds for deeper reasoning.
arXiv Detail & Related papers (2025-03-12T16:03:03Z) - GLUECons: A Generic Benchmark for Learning Under Constraints [102.78051169725455]
In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision.
We model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints.
arXiv Detail & Related papers (2023-02-16T16:45:36Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Making Machine Learning Datasets and Models FAIR for HPC: A Methodology
and Case Study [0.0]
The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable.
These principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing.
We design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques.
arXiv Detail & Related papers (2022-11-03T18:45:46Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.