Selecting and combining complementary feature representations and
classifiers for hate speech detection
- URL: http://arxiv.org/abs/2201.06721v1
- Date: Tue, 18 Jan 2022 03:46:49 GMT
- Title: Selecting and combining complementary feature representations and
classifiers for hate speech detection
- Authors: Rafael M. O. Cruz and Woshington V. de Sousa and George D. C.
Cavalcanti
- Abstract summary: Hate speech is a major issue in social networks due to the high volume of data generated daily.
Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language.
This work argues that a combination of multiple feature extraction techniques and different classification models is needed.
- Score: 6.745479230590518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate speech is a major issue in social networks due to the high volume of
data generated daily. Recent works demonstrate the usefulness of machine
learning (ML) in dealing with the nuances required to distinguish between
hateful posts from just sarcasm or offensive language. Many ML solutions for
hate speech detection have been proposed by either changing how features are
extracted from the text or the classification algorithm employed. However, most
works consider only one type of feature extraction and classification
algorithm. This work argues that a combination of multiple feature extraction
techniques and different classification models is needed. We propose a
framework to analyze the relationship between multiple feature extraction and
classification techniques to understand how they complement each other. The
framework is used to select a subset of complementary techniques to compose a
robust multiple classifiers system (MCS) for hate speech detection. The
experimental study considering four hate speech classification datasets
demonstrates that the proposed framework is a promising methodology for
analyzing and designing high-performing MCS for this task. MCS system obtained
using the proposed framework significantly outperforms the combination of all
models and the homogeneous and heterogeneous selection heuristics,
demonstrating the importance of having a proper selection scheme. Source code,
figures, and dataset splits can be found in the GitHub repository:
https://github.com/Menelau/Hate-Speech-MCS.
Related papers
- Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme
Learning Machine with a New Weighting Scheme and Spectro-Temporal Features
Along with Classical Feature Selection and A New Quantum-Inspired Dimension
Reduction Method [3.8073142980733]
A system for speech emotion recognition (SER) based on speech signal is proposed.
The system consists of three stages: feature extraction, feature selection, and finally feature classification.
A new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods.
arXiv Detail & Related papers (2021-11-13T11:09:38Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them.
The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z) - A study of text representations in Hate Speech Detection [0.0]
Current EU and US legislation against hateful language has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline.
In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection task.
arXiv Detail & Related papers (2021-02-08T20:39:17Z) - Does a Hybrid Neural Network based Feature Selection Model Improve Text
Classification? [9.23545668304066]
We propose a hybrid feature selection method for obtaining relevant features.
We then present three ways of implementing a feature selection and neural network pipeline.
We also observed a slight increase in accuracy on some datasets.
arXiv Detail & Related papers (2021-01-22T09:12:19Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Evaluating the reliability of acoustic speech embeddings [10.5754802112615]
Speech embeddings are fixed-size acoustic representations of variable-length speech sequences.
Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods.
We find that overall, ABX and MAP correlate with one another and with frequency estimation.
arXiv Detail & Related papers (2020-07-27T13:24:09Z) - Rethinking Generative Zero-Shot Learning: An Ensemble Learning
Perspective for Recognising Visual Patches [52.67723703088284]
We propose a novel framework called multi-patch generative adversarial nets (MPGAN)
MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy.
MPGAN has significantly greater accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2020-07-27T05:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.