Related papers: Large language models surpass domain-specific architectures for antepartum electronic fetal monitoring analysis

Large language models surpass domain-specific architectures for antepartum electronic fetal monitoring analysis

URL: http://arxiv.org/abs/2509.18112v2
Date: Thu, 06 Nov 2025 11:12:05 GMT
Title: Large language models surpass domain-specific architectures for antepartum electronic fetal monitoring analysis
Authors: Sheng Wong, Ravi Shankar, Beth Albert, Gabriel Davis Jones,
Abstract summary: Foundation models (FMs) and large language models (LLMs) have demonstrated promising generalization across diverse domains for time-series analysis.<n>We present the first comprehensive benchmark of state-of-the-art architectures for automated antepartum CTG classification.
Score: 3.365708695027943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models (FMs) and large language models (LLMs) have demonstrated promising generalization across diverse domains for time-series analysis, yet their potential for electronic fetal monitoring (EFM) and cardiotocography (CTG) analysis remains underexplored. Most existing CTG studies relied on domain-specific models and lack systematic comparisons with modern foundation or language models, limiting our understanding of whether these models can outperform specialized systems in fetal health assessment. In this study, we present the first comprehensive benchmark of state-of-the-art architectures for automated antepartum CTG classification. Over 2,500 20-minutes recordings were used to evaluate over 15 models spanning domain-specific, time-series, foundation, and language-model categories under a unified framework. Fine-tuned LLMs consistently outperformed both foundation and domain-specific models across data-availability scenarios, except when uterine-activity signals were absent, where domain-specific models showed greater robustness. These performance gains, however, required substantially higher computational resources. Our results highlight that while fine-tuned LLMs achieved state-of-the-art performance for CTG classification, practical deployment must balance performance with computational efficiency.

Related papers

Generalist vs Specialist Time Series Foundation Models: Investigating Potential Emergent Behaviors in Assessing Human Health Using PPG Signals [22.364607686570384]
Foundation models are large-scale machine learning models that are pre-trained on massive amounts of data.<n>Recent works, such as MOMENT, train a generalist time series foundation model with data from multiple domains.<n>This paper aims to conduct a comprehensive benchmarking study to compare the performance of generalist and specialist models.
arXiv Detail & Related papers (2025-10-16T03:13:04Z)
Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks [1.6873748786804317]
Foundation models promise broader adaptability, but their generalization across diverse ECG tasks is not well understood.<n>We benchmarked eight ECG foundation models on 26 clinically relevant tasks using 12 public datasets.<n>While foundation models show promise for adult ECG analysis, substantial gaps remain in cardiac structure, outcome prediction, and patient characterization.
arXiv Detail & Related papers (2025-09-29T17:29:48Z)
A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z)
Benchmarking Foundation Models for Mitotic Figure Classification [0.37334049820361814]
Self-supervised learning techniques have enabled the use of vast amounts of unlabeled data to train large-scale neural networks.<n>In this work, we investigate the use of foundation models for mitotic figure classification.<n>We compare all models against end-to-end-trained baselines, both CNNs and Vision Transformers.
arXiv Detail & Related papers (2025-08-06T13:30:40Z)
Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction [44.0876796031468]
This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts.<n>A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction.
arXiv Detail & Related papers (2025-07-02T07:45:22Z)
A Vector-Quantized Foundation Model for Patient Behavior Monitoring [41.48188433408574]
This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices.<n>We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning.
arXiv Detail & Related papers (2025-03-19T14:01:16Z)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation [6.547981908229007]
We show how architectural and framework biases combine to influence model performance.<n>Experiments show imputation performance variations of up to 20% based on preprocessing and implementation choices.<n>We identify critical gaps between current deep imputation methods and medical requirements.
arXiv Detail & Related papers (2024-07-11T12:33:28Z)
Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers [4.744436991413165]
This study investigates advanced deep learning models, including LSTM, for predicting heart rate time series from the MIT-BIH Database. Results demonstrate that deep learning models, particularly PatchTST, significantly outperform traditional models across multiple metrics.
arXiv Detail & Related papers (2024-06-18T01:55:37Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadata [2.0777058026628583]
We investigate three elements aimed at improving the quantitative accuracy of automatic ECG analysis systems. First, we exploit structured state space models (SSMs) to capture long-term dependencies in time series data. Secondly, we demonstrate that self-supervised learning using contrastive predictive coding can further improve the performance of SSMs. Finally, we incorporate basic demographic metadata alongside the ECG signal as input.
arXiv Detail & Related papers (2023-08-29T13:25:26Z)
Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise. We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z)
Advancing the State-of-the-Art for ECG Analysis through Structured State Space Models [3.822543555265593]
This work explores the prospects of applying the recently introduced structured state space models (SSMs) as a particularly promising approach to ECG analysis. We demonstrate that this approach leads to significant improvements over the current state-of-the-art for ECG classification. The model's ability to capture long-term dependencies allows to shed light on long-standing questions in the literature such as the optimal sampling rate or window size to train classification models.
arXiv Detail & Related papers (2022-11-14T18:01:13Z)
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation. The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z)
Classification of fetal compromise during labour: signal processing and feature engineering of the cardiotocograph [0.0]
This study develops novel CTG features based on clinical expertise and system control theory. Features are evaluated in a machine learning model to assess their efficacy in identifying fetal compromise. ARMA features ranked amongst the top features for detecting fetal compromise.
arXiv Detail & Related papers (2021-10-31T15:02:14Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
A comprehensive comparative evaluation and analysis of Distributional Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous. We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework. Our framework provides insights on the inductive biases of each model. The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.