Related papers: Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks

Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks

URL: http://arxiv.org/abs/2205.13164v1
Date: Thu, 26 May 2022 05:27:50 GMT
Title: Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks
Authors: Divyam Goel, Raksha Sharma
Abstract summary: We address the problem of offensive language detection on Twitter. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence. Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.
Score: 0.5457150493905063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The last few years have witnessed an exponential rise in the propagation of offensive text on social media. Identification of this text with high precision is crucial for the well-being of society. Most of the existing approaches tend to give high toxicity scores to innocuous statements (e.g., "I am a gay man"). These false positives result from over-generalization on the training data where specific terms in the statement may have been used in a pejorative sense (e.g., "gay"). Emphasis on such words alone can lead to discrimination against the classes these systems are designed to protect. In this paper, we address the problem of offensive language detection on Twitter, while also detecting the type and the target of the offence. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence and semantic features in the form of word embeddings into a deep learning architecture using a Graph Convolutional Network. Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.

Related papers

Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation [18.459726677931023]
We present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets.
arXiv Detail & Related papers (2025-02-15T22:46:50Z)
LGB: Language Model and Graph Neural Network-Driven Social Bot Detection [43.92522451274129]
Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion. We propose a novel social bot detection framework LGB, which consists of two main components: language model (LM) and graph neural network (GNN) Experiments on two real-world datasets demonstrate that LGB consistently outperforms state-of-the-art baseline models by up to 10.95%.
arXiv Detail & Related papers (2024-06-13T02:47:38Z)
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG) Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP. Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z)
Disentangling Learnable and Memorizable Data via Contrastive Learning for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready. In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data. Deep semantic clusters of highest confidence are considered learnable, semantic-rich data. Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z)
Detecting Offensive Language on Social Networks: An End-to-end Detection Method based on Graph Attention Networks [7.723697303436006]
We propose an end-to-end method based on community structure and text features for offensive language detection (CT-OLD) We add user opinion to the community structure for representing user features. The user opinion is represented by user historical behavior information, which outperforms that represented by text information.
arXiv Detail & Related papers (2022-03-04T03:57:18Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words. We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
Visually Grounded Compound PCFGs [65.04669567781634]
Exploiting visual groundings for language understanding has recently been drawing much attention. We study visually grounded grammar induction and learn a constituency from both unlabeled text and its visual captions.
arXiv Detail & Related papers (2020-09-25T19:07:00Z)
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network [10.489983726592303]
We investigate a novel relational graph attention network that integrates typed syntactic dependency information. Results show that our method can effectively leverage label information for improving targeted sentiment classification performances.
arXiv Detail & Related papers (2020-02-22T11:17:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.