BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection
- URL: http://arxiv.org/abs/2505.05763v2
- Date: Tue, 15 Jul 2025 05:12:11 GMT
- Title: BMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection
- Authors: Yize Zhou, Jie Zhang, Meijie Wang, Lun Yu,
- Abstract summary: BMDetect integrates journal metadata, semantic embeddings, and textual attributes for holistic manuscript evaluation.<n>It achieves 74.33% AUC, outperforming single-modality baselines by 8.6%, and demonstrates transferability across biomedical subfields.
- Score: 3.306308939118107
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Academic misconduct detection in biomedical research remains challenging due to algorithmic narrowness in existing methods and fragmented analytical pipelines. We present BMDetect, a multimodal deep learning framework that integrates journal metadata (SJR, institutional data), semantic embeddings (PubMedBERT), and GPT-4o-mined textual attributes (methodological statistics, data anomalies) for holistic manuscript evaluation. Key innovations include: (1) multimodal fusion of domain-specific features to reduce detection bias; (2) quantitative evaluation of feature importance, identifying journal authority metrics (e.g., SJR-index) and textual anomalies (e.g., statistical outliers) as dominant predictors; and (3) the BioMCD dataset, a large-scale benchmark with 13,160 retracted articles and 53,411 controls. BMDetect achieves 74.33% AUC, outperforming single-modality baselines by 8.6%, and demonstrates transferability across biomedical subfields. This work advances scalable, interpretable tools for safeguarding research integrity.
Related papers
- Platform for Representation and Integration of multimodal Molecular Embeddings [43.54912893426355]
Existing machine learning methods for molecular embeddings are restricted to specific tasks or data modalities.<n>Existing embeddings capture largely non-overlapping molecular signals, highlighting the value of embedding integration.<n>We propose Platform for Representation and Integration of multimodal Molecular Embeddings (PRISME) to integrate heterogeneous embeddings into a unified multimodal representation.
arXiv Detail & Related papers (2025-07-10T01:18:50Z) - Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge [27.07002392996198]
The FeTA Challenge 2024 advanced automated fetal brain MRI analysis.<n>It introduced biometry prediction as a new task alongside tissue segmentation.<n>For the first time, our diverse multi-centric test set included data from a new low-field (0.55T) MRI dataset.
arXiv Detail & Related papers (2025-05-05T16:54:04Z) - Uncertainty-aware abstention in medical diagnosis based on medical texts [87.88110503208016]
This study addresses the critical issue of reliability for AI-assisted medical diagnosis.<n>We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis.<n>We introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks.
arXiv Detail & Related papers (2025-02-25T10:15:21Z) - Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations [11.082285990214595]
Pub-Guard-LLM is a large language model-based system tailored to fraud detection of biomedical scientific articles.<n>Pub-Guard-LLM consistently surpasses the performance of various baselines.<n>By enhancing both detection performance and explainability in scientific fraud detection, Pub-Guard-LLM contributes to safeguarding research integrity with a novel, effective, open-source tool.
arXiv Detail & Related papers (2025-02-21T12:54:56Z) - CSTRL: Context-Driven Sequential Transfer Learning for Abstractive Radiology Report Summarization [0.37109226820205005]
A radiology report comprises several sections, including the Findings and Impression of the diagnosis.<n>We introduce a sequential transfer learning that ensures key content extraction and coherent summarization.<n>Using MIMIC-CXR and Open-I datasets, our model, CSTRL, achieved state-of-the-art performance.
arXiv Detail & Related papers (2025-02-21T08:32:11Z) - Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly.<n>General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.<n>Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - In Silico Prediction of Blood-Brain Barrier Permeability of Chemical
Compounds through Molecular Feature Modeling [0.0]
Development of new drugs to treat central nervous system disorders presents unique challenges due to poor penetration efficacy across the blood-brain barrier.
In this research, we aim to mitigate this problem through an ML model that analyzes chemical features.
arXiv Detail & Related papers (2022-08-18T19:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.