Traditional Machine Learning Models and Bidirectional Encoder
Representations From Transformer (BERT)-Based Automatic Classification of
Tweets About Eating Disorders: Algorithm Development and Validation Study
- URL: http://arxiv.org/abs/2402.05571v1
- Date: Thu, 8 Feb 2024 11:16:13 GMT
- Title: Traditional Machine Learning Models and Bidirectional Encoder
Representations From Transformer (BERT)-Based Automatic Classification of
Tweets About Eating Disorders: Algorithm Development and Validation Study
- Authors: Jos\'e Alberto Ben\'itez-Andrades, Jos\'e-Manuel Alija-P\'erez,
Maria-Esther Vidal, Rafael Pastor-Vargas and Mar\'ia Teresa Garc\'ia-Ord\'as
- Abstract summary: Our goal was to identify efficient machine learning models for categorizing tweets related to eating disorders.
Transformer-based models outperform traditional techniques in classifying eating disorder-related tweets, though they require more computational resources.
- Score: 1.178706350363215
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background: Eating disorders are increasingly prevalent, and social networks
offer valuable information.
Objective: Our goal was to identify efficient machine learning models for
categorizing tweets related to eating disorders.
Methods: Over three months, we collected tweets about eating disorders. A
2,000-tweet subset was labeled for: (1) being written by individuals with
eating disorders, (2) promoting eating disorders, (3) informativeness, and (4)
scientific content. Both traditional machine learning and deep learning models
were employed for classification, assessing accuracy, F1 score, and
computational time.
Results: From 1,058,957 collected tweets, transformer-based bidirectional
encoder representations achieved the highest F1 scores (71.1%-86.4%) across all
four categories.
Conclusions: Transformer-based models outperform traditional techniques in
classifying eating disorder-related tweets, though they require more
computational resources.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small
Expert-Labeled Tweets for Foodborne Illness Detection [8.934980946374367]
We propose EGAL, a deep learning framework for foodborne illness detection.
EGAL uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.
EGAL has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
arXiv Detail & Related papers (2023-12-02T21:03:23Z) - Food Image Classification and Segmentation with Attention-based Multiple
Instance Learning [51.279800092581844]
The paper presents a weakly supervised methodology for training food image classification and semantic segmentation models.
The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism.
We conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach.
arXiv Detail & Related papers (2023-08-22T13:59:47Z) - Identifying Misinformation on YouTube through Transcript Contextual
Analysis with Transformer Models [1.749935196721634]
We introduce a novel methodology for video classification, focusing on the veracity of the content.
We employ advanced machine learning techniques like transfer learning to solve the classification challenge.
We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset.
arXiv Detail & Related papers (2023-07-22T19:59:16Z) - A Novel Site-Agnostic Multimodal Deep Learning Model to Identify
Pro-Eating Disorder Content on Social Media [0.0]
This study aimed to create a multimodal deep learning model that can determine if a social media post promotes eating disorders.
A labeled dataset of Tweets was collected from Twitter, recently rebranded as X, upon which twelve deep learning models were trained and evaluated.
The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated results akin to those of previous research studies.
arXiv Detail & Related papers (2023-07-06T16:04:46Z) - Transferring Knowledge for Food Image Segmentation using Transformers
and Convolutions [65.50975507723827]
Food image segmentation is an important task that has ubiquitous applications, such as estimating the nutritional value of a plate of food.
One challenge is that food items can overlap and mix, making them difficult to distinguish.
Two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional representation for Image Transformers (BEiT)
The BEiT model outperforms the previous state-of-the-art model by achieving a mean intersection over union of 49.4 on FoodSeg103.
arXiv Detail & Related papers (2023-06-15T15:38:10Z) - 2021 BEETL Competition: Advancing Transfer Learning for Subject
Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI)
Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects.
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Utilizing Deep Learning to Identify Drug Use on Twitter Data [0.0]
The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers.
The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91.
The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
arXiv Detail & Related papers (2020-03-08T07:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.