HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis
and Emotion Recognition
- URL: http://arxiv.org/abs/2102.01909v1
- Date: Wed, 3 Feb 2021 06:59:59 GMT
- Title: HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis
and Emotion Recognition
- Authors: Avihay Chriqui, Inbal Yahav
- Abstract summary: HeBERT is a transformer-based model for modern Hebrew text.
HebEMO is a tool that uses HeBERT to detect polarity and extract emotions from Hebrew user-generated content.
- Score: 0.30458514384586394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of Bidirectional Encoder Representations from Transformers (BERT)
models for different natural language processing (NLP) tasks, and for sentiment
analysis in particular, has become very popular in recent years and not in
vain. The use of social media is being constantly on the rise. Its impact on
all areas of our lives is almost inconceivable. Researches show that social
media nowadays serves as one of the main tools where people freely express
their ideas, opinions, and emotions. During the current Covid-19 pandemic, the
role of social media as a tool to resonate opinions and emotions, became even
more prominent.
This paper introduces HeBERT and HebEMO. HeBERT is a transformer-based model
for modern Hebrew text. Hebrew is considered a Morphological Rich Language
(MRL), with unique characteristics that pose a great challenge in developing
appropriate Hebrew NLP models. Analyzing multiple specifications of the BERT
architecture, we come up with a language model that outperforms all existing
Hebrew alternatives on multiple language tasks.
HebEMO is a tool that uses HeBERT to detect polarity and extract emotions
from Hebrew user-generated content (UGC), which was trained on a unique
Covid-19 related dataset that we collected and annotated for this study. Data
collection and annotation followed an innovative iterative semi-supervised
process that aimed to maximize predictability. HebEMO yielded a high
performance of weighted average F1-score = 0.96 for polarity classification.
Emotion detection reached an F1-score of 0.78-0.97, with the exception of
\textit{surprise}, which the model failed to capture (F1 = 0.41). These results
are better than the best-reported performance, even when compared to the
English language.
Related papers
- Performance Evaluation of Emotion Classification in Japanese Using RoBERTa and DeBERTa [0.0]
Social media monitoring and customer-feedback analysis require accurate emotion detection for Japanese text.<n>This study aims to build a high-accuracy model for predicting the presence or absence of eight Plutchik emotions in Japanese sentences.
arXiv Detail & Related papers (2025-04-22T07:51:37Z) - RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset [4.142287865325998]
We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification.
For sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2%) and F1 score (66.1%), XLM-R semi-supervised (67.2% accuracy, 64.1% F1 score)
In emotion analysis, DistilBERT supervised leads in accuracy (59.8%) and F1 score (31%), mBERT semi-supervised (accuracy (59% and F1 score 26.5
arXiv Detail & Related papers (2025-02-10T06:18:07Z) - SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content.
Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input.
SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for
Classifying Arabic Speech Acts on Twitter [0.32885740436059047]
This paper proposes a Twitter dialectal Arabic speech act classification approach based on a transformer deep learning neural network.
We proposed a BERT based weighted ensemble learning approach to integrate the advantages of various BERT models in dialectal Arabic speech acts classification.
The results show that the best BERT model is araBERTv2-Twitter models with a macro-averaged F1 score and an accuracy of 0.73 and 0.84, respectively.
arXiv Detail & Related papers (2024-01-30T19:01:24Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive
Analysis of Hebrew BERT Models and a New One to Outperform Them All [8.964815786230686]
We present a new pre-trained language model (PLM) for modern Hebrew, termed AlephBERTGimmel, which employs a much larger vocabulary (128K items) than standard Hebrew PLMs before.
We perform a contrastive analysis of this model against all previous Hebrew PLMs (mBERT, heBERT, AlephBERT) and assess the effects of larger vocabularies on task performance.
Our experiments show that larger vocabularies lead to fewer splits, and that reducing splits is better for model performance, across different tasks.
arXiv Detail & Related papers (2022-11-28T10:17:35Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Towards Efficient NLP: A Standard Evaluation and A Strong Baseline [55.29756535335831]
This work presents ELUE (Efficient Language Understanding Evaluation), a standard evaluation, and a public leaderboard for efficient NLP models.
Along with the benchmark, we also pre-train and release a strong baseline, ElasticBERT, whose elasticity is both static and dynamic.
arXiv Detail & Related papers (2021-10-13T21:17:15Z) - FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances.
We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID.
The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z) - Neural Models for Offensive Language Detection [0.0]
Offensive language detection is an ever-growing natural language processing (NLP) application.
We believe contributing to improving and comparing different machine learning models to fight such harmful contents is an important and challenging goal for this thesis.
arXiv Detail & Related papers (2021-05-30T13:02:45Z) - AlephBERT:A Hebrew Large Pre-Trained Language Model to Start-off your
Hebrew NLP Application With [7.345047237652976]
Large Pre-trained Language Models (PLMs) have become ubiquitous in the development of language understanding technology.
While advances reported for English using PLMs are unprecedented, reported advances using PLMs in Hebrew are few and far between.
arXiv Detail & Related papers (2021-04-08T20:51:29Z) - Towards Emotion Recognition in Hindi-English Code-Mixed Data: A
Transformer Based Approach [0.0]
We present a Hinglish dataset labelled for emotion detection.
We highlight a deep learning based approach for detecting emotions in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2021-02-19T14:07:20Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.