"Define Your Terms" : Enhancing Efficient Offensive Speech
Classification with Definition
- URL: http://arxiv.org/abs/2402.03221v1
- Date: Mon, 5 Feb 2024 17:33:22 GMT
- Title: "Define Your Terms" : Enhancing Efficient Offensive Speech
Classification with Definition
- Authors: Huy Nghiem, Umang Gupta, Fred Morstatter
- Abstract summary: We propose a joint embedding architecture that incorporates the input's label and definition for classification via Prototypical Network.
Our model achieves at least 75% of the maximal F1-score while using less than 10% of the available training data across 4 datasets.
- Score: 12.641396816499162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The propagation of offensive content through social media channels has
garnered attention of the research community. Multiple works have proposed
various semantically related yet subtle distinct categories of offensive
speech. In this work, we explore meta-earning approaches to leverage the
diversity of offensive speech corpora to enhance their reliable and efficient
detection. We propose a joint embedding architecture that incorporates the
input's label and definition for classification via Prototypical Network. Our
model achieves at least 75% of the maximal F1-score while using less than 10%
of the available training data across 4 datasets. Our experimental findings
also provide a case study of training strategies valuable to combat resource
scarcity.
Related papers
- Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology [4.080686348274667]
We introduce a novel approach combining unsupervised contrastive learning and a augmentation unique-based technique.
Our method allows the neural network to train on unlabeled data sets, potentially improving performance in downstream tasks.
We present a speech augmentation-based unsupervised learning method that utilizes the similarity between the bottleneck layer feature and the audio reconstructing information.
arXiv Detail & Related papers (2024-08-31T05:40:37Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning [10.897468059705238]
Supervised paraphrasers rely heavily on large quantities of labelled data to help preserve meaning and intent.
In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs)
Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity.
arXiv Detail & Related papers (2023-10-16T16:18:55Z) - Selective In-Context Data Augmentation for Intent Detection using
Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model.
Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents.
Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z) - Disentangling Learnable and Memorizable Data via Contrastive Learning
for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready.
In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data.
Deep semantic clusters of highest confidence are considered learnable, semantic-rich data.
Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Leveraging Dependency Grammar for Fine-Grained Offensive Language
Detection using Graph Convolutional Networks [0.5457150493905063]
We address the problem of offensive language detection on Twitter.
We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence.
Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.
arXiv Detail & Related papers (2022-05-26T05:27:50Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Dominant Set-based Active Learning for Text Classification and its
Application to Online Social Media [0.0]
We present a novel pool-based active learning method for the training of large unlabeled corpus with minimum annotation cost.
Our proposed method does not have any parameters to be tuned, making it dataset-independent.
Our method achieves a higher performance in comparison to the state-of-the-art active learning strategies.
arXiv Detail & Related papers (2022-01-28T19:19:03Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Decoding Imagined Speech using Wavelet Features and Deep Neural Networks [2.4063592468412267]
This paper proposes a novel approach that uses deep neural networks for classifying imagined speech.
The proposed approach employs only the EEG channels over specific areas of the brain for classification, and derives distinct feature vectors from each of those channels.
The proposed architecture and the approach of treating the data have resulted in an average classification accuracy of 57.15%, which is an improvement of around 35% over the state-of-the-art results.
arXiv Detail & Related papers (2020-03-19T00:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.