Related papers: Pre-trained Large Language Models Learn Hidden Markov Models In-context

Pre-trained Large Language Models Learn Hidden Markov Models In-context

URL: http://arxiv.org/abs/2506.07298v2
Date: Wed, 11 Jun 2025 05:17:22 GMT
Title: Pre-trained Large Language Models Learn Hidden Markov Models In-context
Authors: Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun,
Abstract summary: Hidden Models (HMMs) are tools for modeling sequential data with latentian structure, yet fitting them to real-world data remains computationally challenging.<n>We show that pre-trained language (LLMs) can effectively learn data generated via in-context learning.
Score: 10.06882436449576
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)$\unicode{x2013}$their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn and predict HMM-generated sequences$\unicode{x2013}$an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Exploring Scaling Laws for EHR Foundation Models [17.84205864956449]
We present the first empirical investigation of scaling laws for EHR foundation models.<n>We identify consistent scaling patterns, including parabolic IsoFLOPs curves and power-law relationships between compute, model parameters, data size, and clinical utility.
arXiv Detail & Related papers (2025-05-29T01:05:11Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Large Language Models are Powerful Electronic Health Record Encoders [4.520903886487343]
General-purpose Large Language Models (LLMs) are used to encode EHR data into representations for downstream clinical prediction tasks.<n>We show that LLM-based embeddings can often match or even surpass the performance of a specialized EHR foundation model.<n>One of the tested LLM-based models achieves superior performance for disease onset, hospitalization, and mortality prediction.
arXiv Detail & Related papers (2025-02-24T18:30:36Z)
The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data [0.3749861135832072]
This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data.<n>The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.
arXiv Detail & Related papers (2024-11-27T20:18:36Z)
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning [0.0]
We introduce a Multi-Modal Fusion (MMF) framework that harnesses the analytical prowess of Graph Neural Networks (GNNs) and the linguistic generative and predictive abilities of Large Language Models (LLMs) Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting.
arXiv Detail & Related papers (2024-08-27T11:10:39Z)
Less can be more for predicting properties with large language models [5.561723952524538]
We report on limitations in large language models' ability to learn from coordinate information in coordinate-category data.<n>We find that LLMs consistently fail to capture coordinate information while excelling at category patterns.<n>Our findings suggest immediate practical implications for materials property prediction tasks dominated by structural effects.
arXiv Detail & Related papers (2024-06-25T05:45:07Z)
Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing. We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z)
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.