Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
- URL: http://arxiv.org/abs/2401.06866v2
- Date: Sat, 27 Apr 2024 06:20:26 GMT
- Title: Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data
- Authors: Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park,
- Abstract summary: Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect.
This paper investigates the capacity of LLMs to make inferences about health based on contextual information.
We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets.
- Score: 43.48422400822597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.
Related papers
- Efficient and Personalized Mobile Health Event Prediction via Small Language Models [14.032049217103024]
Small Language Models (SLMs) are potential candidates to solve privacy and computational issues.
This paper examines the capability of SLMs to accurately analyze health data, such as steps, calories, sleep minutes, and other vital statistics.
Our results indicate that SLMs could potentially be deployed on wearable or mobile devices for real-time health monitoring.
arXiv Detail & Related papers (2024-09-17T01:57:57Z) - STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis [2.303486126296845]
Large Language Models (LLMs) have shown promise in delivering interactive health advice.
Traditional methods like Retrieval-Augmented Generation (RAG) and fine-tuning often fail to fully utilize the complex, multi-dimensional, and temporally relevant data.
This paper introduces a graph-augmented LLM framework designed to significantly enhance the personalization and clarity of health insights.
arXiv Detail & Related papers (2024-06-24T01:22:54Z) - Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support [0.0]
Two large language models, GPT-4 and Chat-GPT, were tested in responding to a set of 18 psychological prompts.
GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52.
arXiv Detail & Related papers (2024-05-15T12:44:54Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Mental-LLM: Leveraging Large Language Models for Mental Health
Prediction via Online Text Data [42.965788205842465]
We present a comprehensive evaluation of multiple large language models (LLMs) on various mental health prediction tasks.
We conduct experiments covering zero-shot prompting, few-shot prompting, and instruction fine-tuning.
Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%.
arXiv Detail & Related papers (2023-07-26T06:00:50Z) - Large Language Models are Few-Shot Health Learners [8.549461354297677]
Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks.
Health applications require that models be grounded in numerical data.
We demonstrate that with only few-shot tuning, a large language model is capable of grounding various physiological and behavioral time-series data.
arXiv Detail & Related papers (2023-05-24T19:25:16Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z) - Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state.
The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.