Related papers: Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

URL: http://arxiv.org/abs/2308.04180v1
Date: Tue, 8 Aug 2023 10:42:33 GMT
Title: Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"
Authors: Bruno Machado Carneiro, Michele Linardi, Julien Longhi
Abstract summary: We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources. This global context allows us to test the generalization ability of SUD classifiers. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning.
Score: 4.87717454493713
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.

Related papers

The broader spectrum of in-context learning [13.111927028942329]
We provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of metalearned in-context learning. We suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be elicited. We close by suggesting that research on in-context learning should consider this broader spectrum in-context capabilities and types of generalization.
arXiv Detail & Related papers (2024-12-05T00:05:11Z)
Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation. Our approach can be applied to existing datasets by automatically generating hard negative test captions. Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z)
Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z)
Towards Few-Shot Learning in the Open World: A Review and Beyond [52.41344813375177]
Few-shot learning aims to mimic human intelligence by enabling significant generalizations and transferability. This paper presents a review of recent advancements designed to adapt FSL for use in open-world settings. We categorize existing methods into three distinct types of open-world few-shot learning: those involving varying instances, varying classes, and varying distributions.
arXiv Detail & Related papers (2024-08-19T06:23:21Z)
Exploiting Adaptive Contextual Masking for Aspect-Based Sentiment Analysis [0.6827423171182154]
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that entails the extraction of multifaceted aspects, opinions, and sentiments from the given text. We present adaptive masking methods that remove irrelevant tokens based on context to assist in Aspect Term Extraction and Aspect Sentiment Classification subtasks of ABSA.
arXiv Detail & Related papers (2024-02-21T11:33:09Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection. We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora [14.844685568451833]
We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface.
arXiv Detail & Related papers (2021-03-19T21:26:28Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
"To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers [18.053219155702465]
We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
arXiv Detail & Related papers (2020-06-05T06:59:22Z)
Observations on Annotations [0.5175994976508882]
It approaches the topic from several angles including Hypertext, Computational Linguistics and Language Technology, Artificial Intelligence and Open Science. In terms of complexity, they can range from trivial to highly sophisticated, in terms of maturity from experimental to standardised. Primary research data such as, e.g., text documents can be annotated on different layers concurrently, which are independent but can be exploited using multi-layer querying.
arXiv Detail & Related papers (2020-04-21T20:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.