Related papers: The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset

The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset

URL: http://arxiv.org/abs/2508.02417v1
Date: Mon, 04 Aug 2025 13:40:25 GMT
Title: The Role of Review Process Failures in Affective State Estimation: An Empirical Investigation of DEAP Dataset
Authors: Nazmun N Khan, Taylor Sweet, Chase A Harvey, Calder Knapp, Dean J. Krusienski, David E Thompson,
Abstract summary: We reviewed 101 studies, focusing on the widely used DEAP dataset for emotion recognition.<n>We found that nearly 87% of the reviewed papers contained one or more of these errors.<n>These findings reveal fundamental gaps in standardized evaluation practices and highlight critical deficiencies in the peer review process for machine learning applications in neuroscience.
Score: 0.45080838507508303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The reliability of affective state estimation using EEG data is in question, given the variability in reported performance and the lack of standardized evaluation protocols. To investigate this, we reviewed 101 studies, focusing on the widely used DEAP dataset for emotion recognition. Our analysis revealed widespread methodological issues that include data leakage from improper segmentation, biased feature selection, flawed hyperparameter optimization, neglect of class imbalance, and insufficient methodological reporting. Notably, we found that nearly 87% of the reviewed papers contained one or more of these errors. Moreover, through experimental analysis, we observed that such methodological flaws can inflate the classification accuracy by up to 46%. These findings reveal fundamental gaps in standardized evaluation practices and highlight critical deficiencies in the peer review process for machine learning applications in neuroscience, emphasizing the urgent need for stricter methodological standards and evaluation protocols.

Related papers

Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies [2.6703221234079946]
This study critically examines the methodological rigor in credit card fraud detection research.<n>We demonstrate that even simple models can achieve deceptively impressive results when basic methodological principles are violated.
arXiv Detail & Related papers (2025-06-03T09:56:43Z)
Causal Machine Learning Methods for Estimating Personalised Treatment Effects -- Insights on validity from two large trials [0.0]
Causal machine learning (ML) methods hold great promise for advancing precision medicine.<n>In this study, we assessed the internal and external validity of 17 mainstream causal ML methods.
arXiv Detail & Related papers (2025-01-07T09:44:05Z)
Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models [0.5223954072121659]
Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models.<n>In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference.<n>The proposed method is straightforward and practical to implement and has a broad applicability in fields where outlier detection or removal is challenging.
arXiv Detail & Related papers (2024-12-29T21:22:24Z)
Pitfalls of topology-aware image segmentation [81.19923502845441]
We identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts, and inappropriate use of evaluation metrics.<n>We propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
arXiv Detail & Related papers (2024-12-19T08:11:42Z)
Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models. We decompose the uncertainty of diagnostic parameters into data aspect and model aspect. Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z)
On Pixel-level Performance Assessment in Anomaly Detection [87.7131059062292]
Anomaly detection methods have demonstrated remarkable success across various applications. However, assessing their performance, particularly at the pixel-level, presents a complex challenge. In this paper, we dissect the intricacies of this challenge, underscored by visual evidence and statistical analysis.
arXiv Detail & Related papers (2023-10-25T08:02:27Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z)
A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification [0.491574468325115]
We present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation.
arXiv Detail & Related papers (2022-11-28T12:25:27Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.