Studying Socially Unacceptable Discourse Classification (SUD) through
different eyes: "Are we on the same page ?"
- URL: http://arxiv.org/abs/2308.04180v1
- Date: Tue, 8 Aug 2023 10:42:33 GMT
- Title: Studying Socially Unacceptable Discourse Classification (SUD) through
different eyes: "Are we on the same page ?"
- Authors: Bruno Machado Carneiro, Michele Linardi, Julien Longhi
- Abstract summary: We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources.
This global context allows us to test the generalization ability of SUD classifiers.
From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning.
- Score: 4.87717454493713
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study Socially Unacceptable Discourse (SUD) characterization and detection
in online text. We first build and present a novel corpus that contains a large
variety of manually annotated texts from different online sources used so far
in state-of-the-art Machine learning (ML) SUD detection solutions. This global
context allows us to test the generalization ability of SUD classifiers that
acquire knowledge around the same SUD categories, but from different contexts.
From this perspective, we can analyze how (possibly) different annotation
modalities influence SUD learning by discussing open challenges and open
research directions. We also provide several data insights which can support
domain experts in the annotation task.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade.
This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z) - Towards Few-Shot Learning in the Open World: A Review and Beyond [52.41344813375177]
Few-shot learning aims to mimic human intelligence by enabling significant generalizations and transferability.
This paper presents a review of recent advancements designed to adapt FSL for use in open-world settings.
We categorize existing methods into three distinct types of open-world few-shot learning: those involving varying instances, varying classes, and varying distributions.
arXiv Detail & Related papers (2024-08-19T06:23:21Z) - Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment
Analysis [0.6827423171182154]
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that entails the extraction of multifaceted aspects, opinions, and sentiments from the given text.
We present adaptive masking methods that remove irrelevant tokens based on context to assist in Aspect Term Extraction and Aspect Sentiment Classification subtasks of ABSA.
arXiv Detail & Related papers (2024-02-21T11:33:09Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between
Corpora [14.844685568451833]
We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings.
TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface.
arXiv Detail & Related papers (2021-03-19T21:26:28Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - "To Target or Not to Target": Identification and Analysis of Abusive
Text Using Ensemble of Classifiers [18.053219155702465]
We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms.
Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
arXiv Detail & Related papers (2020-06-05T06:59:22Z) - Observations on Annotations [0.5175994976508882]
It approaches the topic from several angles including Hypertext, Computational Linguistics and Language Technology, Artificial Intelligence and Open Science.
In terms of complexity, they can range from trivial to highly sophisticated, in terms of maturity from experimental to standardised.
Primary research data such as, e.g., text documents can be annotated on different layers concurrently, which are independent but can be exploited using multi-layer querying.
arXiv Detail & Related papers (2020-04-21T20:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.