Related papers: TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

URL: http://arxiv.org/abs/2312.17704v1
Date: Fri, 29 Dec 2023 17:47:00 GMT
Title: TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models
Authors: Felipe Oliveira, Victoria Reis, Nelson Ebecken
Abstract summary: TuPy-E is the largest annotated Portuguese corpus for hate speech detection. We conduct a detailed analysis using advanced techniques like BERT models.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications

Related papers

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation [67.69631485036665]
We conduct a comprehensive examination of hate speech regulations and strategies from three perspectives.<n>Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions.<n>We suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation.
arXiv Detail & Related papers (2025-07-06T11:25:23Z)
Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection [4.207344194403586]
Social media platforms have become central to global communication, yet they also facilitate the spread of hate speech. For underrepresented dialects like Levantine Arabic, detecting hate speech presents unique cultural, ethical, and linguistic challenges. This paper explores the complex sociopolitical and linguistic landscape of Levantine Arabic and critically examines the limitations of current datasets used in hate speech detection.
arXiv Detail & Related papers (2024-12-14T23:02:46Z)
A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages [0.5825410941577593]
Social media and easy accessibility of the internet has facilitated the spread of hate speech. This article provides a detailed survey of hate speech detection in low-resource languages around the world.
arXiv Detail & Related papers (2024-11-28T09:42:53Z)
Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish [0.08192907805418582]
Hate speech detection deals with many language variants, slang, slurs, expression modalities, and cultural nuances. This work presents a brief analysis of the performance of large language models in the detection of Hate Speech for Rioplatense Spanish.
arXiv Detail & Related papers (2024-10-16T02:32:12Z)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion. We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process. Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z)
KamerRaad: Enhancing Information Retrieval in Belgian National Politics through Hierarchical Summarization and Conversational Interfaces [55.00702535694059]
KamerRaad is an AI tool that leverages large language models to help citizens interactively engage with Belgian political information. The tool extracts and concisely summarizes key excerpts from parliamentary proceedings, followed by the potential for interaction based on generative AI.
arXiv Detail & Related papers (2024-04-22T15:01:39Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
Code-Switching without Switching: Language Agnostic End-to-End Speech Translation [68.8204255655161]
We treat speech recognition and translation as one unified end-to-end speech translation problem. By training LAST with both input languages, we decode speech into one target language, regardless of the input language.
arXiv Detail & Related papers (2022-10-04T10:34:25Z)
BERTuit: Understanding Spanish language in Twitter through a native transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech. Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z)
Cross-lingual hate speech detection based on multilingual domain-specific word embeddings [4.769747792846004]
We propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language. We show that the use of our simple yet specific multilingual hate representations improves classification results.
arXiv Detail & Related papers (2021-04-30T02:24:50Z)
Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection [1.1744028458220426]
This paper presents a new approach for offensive language and hate speech detection on social media. Our approach incorporates an offensive lexicon composed by implicit and explicit offensive and swearing expressions annotated with binary classes. Due to the severity of the hate speech and offensive comments in Brazil and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate our method.
arXiv Detail & Related papers (2021-04-25T21:34:51Z)
DeepHate: Hate Speech Detection via Multi-Faceted Text Representations [8.192671048046687]
DeepHate is a novel deep learning model that combines multi-faceted text representations such as word embeddings, sentiments, and topical information. We conduct extensive experiments and evaluate DeepHate on three large publicly available real-world datasets.
arXiv Detail & Related papers (2021-03-14T16:11:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.