TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection
Tasks
- URL: http://arxiv.org/abs/2205.10726v1
- Date: Sun, 22 May 2022 03:47:18 GMT
- Title: TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection
Tasks
- Authors: Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke
Rundensteiner
- Abstract summary: Foodborne illness is a serious but preventable public health problem.
There is a dearth of labeled datasets for developing effective outbreak detection models.
We present TWEET-FID, the first publicly available annotated dataset for foodborne illness incident detection tasks.
- Score: 14.523433519237607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foodborne illness is a serious but preventable public health problem -- with
delays in detecting the associated outbreaks resulting in productivity loss,
expensive recalls, public safety hazards, and even loss of life. While social
media is a promising source for identifying unreported foodborne illnesses,
there is a dearth of labeled datasets for developing effective outbreak
detection models. To accelerate the development of machine learning-based
models for foodborne outbreak detection, we thus present TWEET-FID
(TWEET-Foodborne Illness Detection), the first publicly available annotated
dataset for multiple foodborne illness incident detection tasks. TWEET-FID
collected from Twitter is annotated with three facets: tweet class, entity
type, and slot type, with labels produced by experts as well as by crowdsource
workers. We introduce several domain tasks leveraging these three facets: text
relevance classification (TRC), entity mention detection (EMD), and slot
filling (SF). We describe the end-to-end methodology for dataset design,
creation, and labeling for supporting model development for these tasks. A
comprehensive set of results for these tasks leveraging state-of-the-art
single- and multi-task deep learning methods on the TWEET-FID dataset are
provided. This dataset opens opportunities for future research in foodborne
outbreak detection.
Related papers
- Self-supervised transformer-based pre-training method with General Plant Infection dataset [3.969851116372513]
This study proposes an advanced network architecture that combines Contrastive Learning and Masked Image Modeling (MIM)
The proposed network architecture demonstrates effectiveness in addressing plant pest and disease recognition tasks, achieving notable detection accuracy.
Our code and dataset will be publicly available to advance research in plant pest and disease recognition.
arXiv Detail & Related papers (2024-07-20T15:48:35Z) - UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small
Expert-Labeled Tweets for Foodborne Illness Detection [8.934980946374367]
We propose EGAL, a deep learning framework for foodborne illness detection.
EGAL uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.
EGAL has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
arXiv Detail & Related papers (2023-12-02T21:03:23Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - LDD: A Dataset for Grape Diseases Object Detection and Instance
Segmentation [2.966925013268916]
A new dataset has been created with the goal of advancing the state-of-the-art of diseases recognition via instance segmentation approaches.
This was achieved by gathering images of leaves and clusters of grapes affected by diseases in their natural context.
The dataset contains photos of 10 object types which include leaves and grapes with and without symptoms of the eight more common grape diseases, with a total of 17,706 labeled instances in 1,092 images.
arXiv Detail & Related papers (2022-06-21T08:50:13Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - Unsupervised deep learning techniques for powdery mildew recognition
based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves.
We focus on unsupervised deep learning techniques applied to multispectral imaging data.
We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z) - Genome Sequence Classification for Animal Diagnostics with Graph
Representations and Deep Neural Networks [4.339839287869652]
Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral.
Current animal disease diagnostics is based on traditional tests such as bacterial culture, serolog, and Polymerase Chain Reaction (PCR) tests.
We show that networks-based machine learning approaches can detect pathogen signature with up to 89.7% accuracy.
arXiv Detail & Related papers (2020-07-24T22:30:18Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Progressive Object Transfer Detection [84.48927705173494]
We propose a novel Progressive Object Transfer Detection (POTD) framework.
First, POTD can leverage various object supervision of different domains effectively into a progressive detection procedure.
Second, POTD consists of two delicate transfer stages, i.e., Low-Shot Transfer Detection (LSTD), and Weakly-Supervised Transfer Detection (WSTD)
arXiv Detail & Related papers (2020-02-12T00:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.