Related papers: Review GIDE -- Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models

Review GIDE -- Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models

URL: http://arxiv.org/abs/2503.09743v1
Date: Wed, 12 Mar 2025 18:42:43 GMT
Title: Review GIDE -- Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models
Authors: Timothy Laurence, Joshua Harris, Leo Loman, Amy Douglas, Yung-Wai Chan, Luke Hounsome, Lesley Larkin, Michael Borowitz,
Abstract summary: Foodborne gastrointestinal (GI) illness is a common cause of ill health in the UK.<n>This study introduces a novel annotation schema, developed with experts in GI illness, applied to the Yelp Open dataset of reviews.<n>We evaluate the performance of open-weight LLMs across three tasks: GI illness detection, symptom extraction, and food extraction.
Score: 0.47321763526812183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foodborne gastrointestinal (GI) illness is a common cause of ill health in the UK. However, many cases do not interact with the healthcare system, posing significant challenges for traditional surveillance methods. The growth of publicly available online restaurant reviews and advancements in large language models (LLMs) present potential opportunities to extend disease surveillance by identifying public reports of GI illness. In this study, we introduce a novel annotation schema, developed with experts in GI illness, applied to the Yelp Open Dataset of reviews. Our annotations extend beyond binary disease detection, to include detailed extraction of information on symptoms and foods. We evaluate the performance of open-weight LLMs across these three tasks: GI illness detection, symptom extraction, and food extraction. We compare this performance to RoBERTa-based classification models fine-tuned specifically for these tasks. Our results show that using prompt-based approaches, LLMs achieve micro-F1 scores of over 90% for all three of our tasks. Using prompting alone, we achieve micro-F1 scores that exceed those of smaller fine-tuned models. We further demonstrate the robustness of LLMs in GI illness detection across three bias-focused experiments. Our results suggest that publicly available review text and LLMs offer substantial potential for public health surveillance of GI illness by enabling highly effective extraction of key information. While LLMs appear to exhibit minimal bias in processing, the inherent limitations of restaurant review data highlight the need for cautious interpretation of results.

Related papers

VaxGuard: A Multi-Generator, Multi-Type, and Multi-Role Dataset for Detecting LLM-Generated Vaccine Misinformation [8.08298631918046]
Existing benchmarks often overlook vaccine-related misinformation and the diverse roles of misinformation spreaders.<n>This paper introduces VaxGuard, a novel dataset designed to address these challenges.
arXiv Detail & Related papers (2025-03-12T06:43:25Z)
Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases [0.9798965031257411]
We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.
arXiv Detail & Related papers (2024-09-13T21:28:54Z)
Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases. We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z)
Evaluating Large Language Models for Public Health Classification and Extraction Tasks [0.3545046504280562]
We present evaluations of Large Language Models (LLMs) for public health tasks involving the classification and extraction of free text.<n>We evaluate eleven open-weight LLMs across all tasks using zero-shot in-context learning.<n>We find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources.
arXiv Detail & Related papers (2024-05-23T16:33:18Z)
Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations. We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN [70.76142503046782]
We propose supplementing machine learning-based (ML) healthcare tools for bias with SLOGAN, an automatic tool for capturing local biases in a clinical prediction task. LOGAN adapts an existing tool, LOcal Group biAs detectioN, by contextualizing group bias detection in patient illness severity and past medical history. On average, SLOGAN identifies larger fairness disparities in over 75% of patient groups than LOGAN while maintaining clustering quality.
arXiv Detail & Related papers (2022-11-16T08:04:12Z)
TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks [14.523433519237607]
Foodborne illness is a serious but preventable public health problem. There is a dearth of labeled datasets for developing effective outbreak detection models. We present TWEET-FID, the first publicly available annotated dataset for foodborne illness incident detection tasks.
arXiv Detail & Related papers (2022-05-22T03:47:18Z)
Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents. We generate an automatic tumor boundary detector for the rare disease of glioblastoma. We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.