Investigating the Association Between Text-Based Indications of Foodborne Illness from Yelp Reviews and New York City Health Inspection Outcomes (2023)
- URL: http://arxiv.org/abs/2510.16334v1
- Date: Sat, 18 Oct 2025 03:39:08 GMT
- Title: Investigating the Association Between Text-Based Indications of Foodborne Illness from Yelp Reviews and New York City Health Inspection Outcomes (2023)
- Authors: Eden Shaveet, Crystal Su, Daniel Hsu, Luis Gravano,
- Abstract summary: Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food.<n>Social media platforms host abundant user-generated content that can provide timely public health signals.<n>This paper analyzes signals from Yelp reviews produced by a Hierarchical Sigmoid Attention Network (HSAN)<n>We find minimal correlation between HSAN signals and inspection scores at the tract level and no significant differences by number of C-graded restaurants.
- Score: 4.192844731345034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food. Restaurants are critical venues to investigate outbreaks because they share sourcing, preparation, and distribution of foods. Public reporting of illness via formal channels is limited, whereas social media platforms host abundant user-generated content that can provide timely public health signals. This paper analyzes signals from Yelp reviews produced by a Hierarchical Sigmoid Attention Network (HSAN) classifier and compares them with official restaurant inspection outcomes issued by the New York City Department of Health and Mental Hygiene (NYC DOHMH) in 2023. We evaluate correlations at the Census tract level, compare distributions of HSAN scores by prevalence of C-graded restaurants, and map spatial patterns across NYC. We find minimal correlation between HSAN signals and inspection scores at the tract level and no significant differences by number of C-graded restaurants. We discuss implications and outline next steps toward address-level analyses.
Related papers
- GLEN-Bench: A Graph-Language based Benchmark for Nutritional Health [48.94971812317643]
We introduce GLEN-Bench, the first comprehensive graph-language based benchmark for nutritional health assessment.<n>GLEN-Bench includes three linked tasks: risk detection identifies at-risk individuals from dietary and socioeconomic patterns; recommendation suggests personalized foods that meet clinical needs within resource constraints.<n>Our analysis identifies clear dietary patterns linked to health risks, providing insights that can guide practical interventions.
arXiv Detail & Related papers (2026-01-26T03:32:46Z) - Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion [69.84988999191343]
We introduce FastFood, a dataset with 84,446 images across 908 fast food categories, featuring ingredient and nutritional annotations.<n>We propose a new model-agnostic Visual-Ingredient Feature Fusion (VIF$2$) method to enhance nutrition estimation.
arXiv Detail & Related papers (2025-05-13T17:01:21Z) - Review GIDE -- Restaurant Review Gastrointestinal Illness Detection and Extraction with Large Language Models [0.47321763526812183]
Foodborne gastrointestinal (GI) illness is a common cause of ill health in the UK.<n>This study introduces a novel annotation schema, developed with experts in GI illness, applied to the Yelp Open dataset of reviews.<n>We evaluate the performance of open-weight LLMs across three tasks: GI illness detection, symptom extraction, and food extraction.
arXiv Detail & Related papers (2025-03-12T18:42:43Z) - NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning [49.06840168630573]
Diet plays a critical role in human health, yet tailoring dietary reasoning to individual health conditions remains a major challenge.<n>Nutrition Question Answering (QA) has emerged as a popular method for addressing this problem.<n>We introduce the Nutritional Graph Question Answering (NGQA) benchmark, the first graph question answering dataset designed for personalized nutritional health reasoning.
arXiv Detail & Related papers (2024-12-20T04:13:46Z) - Integrating Social Determinants of Health into Knowledge Graphs: Evaluating Prediction Bias and Fairness in Healthcare [47.23120247002356]
Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored.<n>This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG.
arXiv Detail & Related papers (2024-11-29T20:35:01Z) - Seasonality Patterns in 311-Reported Foodborne Illness Cases and Machine Learning-Identified Indications of Foodborne Illnesses from Yelp Reviews, New York City, 2022-2023 [8.972167744334206]
We extracted Yelp reviews and metadata to identify potential outbreaks of foodborne illness in connection with consuming food from restaurants.
We identified seasonal patterns in foodborne illness reports from 311 and identified seasonal patterns of foodborne illness from Yelp reviews for New York City restaurants using a Hierarchical Sigmoid Attention Network (HSAN)
arXiv Detail & Related papers (2024-05-09T23:10:31Z) - From Canteen Food to Daily Meals: Generalizing Food Recognition to More
Practical Scenarios [92.58097090916166]
We present two new benchmarks, namely DailyFood-172 and DailyFood-16, designed to curate food images from everyday meals.
These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.
arXiv Detail & Related papers (2024-03-12T08:32:23Z) - UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small
Expert-Labeled Tweets for Foodborne Illness Detection [8.934980946374367]
We propose EGAL, a deep learning framework for foodborne illness detection.
EGAL uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.
EGAL has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
arXiv Detail & Related papers (2023-12-02T21:03:23Z) - From Plate to Prevention: A Dietary Nutrient-aided Platform for Health
Promotion in Singapore [18.412322278232757]
We develop the FoodSG platform to incubate diverse healthcare-oriented applications as a service in Singapore.
To overcome the hurdle in recognition performance brought by Singaporean multifarious food dishes, we propose to integrate supervised contrastive learning into our food recognition model FoodSG-SCL.
arXiv Detail & Related papers (2023-01-10T07:51:36Z) - Association Between Neighborhood Factors and Adult Obesity in Shelby
County, Tennessee: Geospatial Machine Learning Approach [0.966840768820136]
The objective of this study was to investigate the effects of social determinants of Health (SDoH) on obesity prevalence among adults in Shelby County, Tennessee, USA.
Obesity prevalence was obtained from publicly available CDC 500 cities database while SDoH indicators were extracted from the U.S. Census and USDA.
Results depicted a high percentage of neighborhoods experiencing high adult obesity prevalence within Shelby County.
arXiv Detail & Related papers (2022-08-09T15:28:43Z) - American Twitter Users Revealed Social Determinants-related Oral Health
Disparities amid the COVID-19 Pandemic [72.44305630014534]
We collected oral health-related tweets during the COVID-19 pandemic from 9,104 Twitter users across 26 states.
Women and younger adults (19-29) are more likely to talk about oral health problems.
People from counties at a higher risk of COVID-19 talk more about tooth decay/gum bleeding and chipped tooth/tooth break.
arXiv Detail & Related papers (2021-09-16T01:10:06Z) - Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images
and Recipes with Semantic Consistency and Attention Mechanism [70.85894675131624]
We learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another.
We propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities.
We show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin.
arXiv Detail & Related papers (2020-03-09T07:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.