Exploring Scaling Laws for EHR Foundation Models
- URL: http://arxiv.org/abs/2505.22964v1
- Date: Thu, 29 May 2025 01:05:11 GMT
- Title: Exploring Scaling Laws for EHR Foundation Models
- Authors: Sheng Zhang, Qin Liu, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon,
- Abstract summary: We present the first empirical investigation of scaling laws for EHR foundation models.<n>We identify consistent scaling patterns, including parabolic IsoFLOPs curves and power-law relationships between compute, model parameters, data size, and clinical utility.
- Score: 17.84205864956449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of scaling laws has profoundly shaped the development of large language models (LLMs), enabling predictable performance gains through systematic increases in model size, dataset volume, and compute. Yet, these principles remain largely unexplored in the context of electronic health records (EHRs) -- a rich, sequential, and globally abundant data source that differs structurally from natural language. In this work, we present the first empirical investigation of scaling laws for EHR foundation models. By training transformer architectures on patient timeline data from the MIMIC-IV database across varying model sizes and compute budgets, we identify consistent scaling patterns, including parabolic IsoFLOPs curves and power-law relationships between compute, model parameters, data size, and clinical utility. These findings demonstrate that EHR models exhibit scaling behavior analogous to LLMs, offering predictive insights into resource-efficient training strategies. Our results lay the groundwork for developing powerful EHR foundation models capable of transforming clinical prediction tasks and advancing personalized healthcare.
Related papers
- Pre-trained Large Language Models Learn Hidden Markov Models In-context [10.06882436449576]
Hidden Models (HMMs) are tools for modeling sequential data with latentian structure, yet fitting them to real-world data remains computationally challenging.<n>We show that pre-trained language (LLMs) can effectively learn data generated via in-context learning.
arXiv Detail & Related papers (2025-06-08T21:49:38Z) - Scaling Laws for Emulation of Stellar Spectra [0.0]
We provide training guidelines for scaling Transformer-based spectral emulators to achieve optimal performance.<n>Our results suggest that optimal computational resource allocation requires balanced scaling.<n>This study establishes a foundation for developing spectral foundational models with enhanced domain transfer capabilities.
arXiv Detail & Related papers (2025-03-24T12:20:24Z) - Large Language Models are Powerful Electronic Health Record Encoders [4.520903886487343]
General-purpose Large Language Models (LLMs) are used to encode EHR data into representations for downstream clinical prediction tasks.<n>We show that LLM-based embeddings can often match or even surpass the performance of a specialized EHR foundation model.<n>One of the tested LLM-based models achieves superior performance for disease onset, hospitalization, and mortality prediction.
arXiv Detail & Related papers (2025-02-24T18:30:36Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models [34.79589443380606]
The scaling of large language models (LLMs) is a critical research area for the efficiency and effectiveness of model training and deployment.
Our work investigates the transferability and discrepancies of scaling laws between Dense Models and MoE models.
arXiv Detail & Related papers (2024-10-08T03:21:56Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA [0.0]
This study introduces a systematic framework to compare the efficacy of Large Language Models (LLMs) for fine-tuning across various cheminformatics tasks.
We assessed three well-known models-RoBERTa, BART, and LLaMA-on their ability to predict molecular properties.
We found that LLaMA-based models generally offered the lowest validation loss, suggesting their superior adaptability across tasks and scales.
arXiv Detail & Related papers (2024-05-02T02:20:12Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Knowledge Graph Embedding with Electronic Health Records Data via Latent
Graphical Block Model [13.398292423857756]
We propose to infer the conditional dependency structure among EHR features via a latent graphical block model (LGBM)
We establish the statistical rates of the proposed estimators and show the perfect recovery of the block structure.
arXiv Detail & Related papers (2023-05-31T16:18:46Z) - A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws.
We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology.
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.