BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish
- URL: http://arxiv.org/abs/2502.07637v1
- Date: Tue, 11 Feb 2025 15:25:10 GMT
- Title: BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish
- Authors: Kätriin Kukk, Danila Petrelli, Judit Casademont, Eric J. W. Orlowski, Michał Dzieliński, Maria Jacobson,
- Abstract summary: BiaSWE is an expert-annotated dataset tailored for misogyny detection in the Swedish language.
Our interdisciplinary team developed a rigorous annotation process, incorporating both domain knowledge and language expertise.
The dataset, along with the annotation guidelines, is publicly available for further research.
- Score: 0.0
- License:
- Abstract: In this study, we introduce the process for creating BiaSWE, an expert-annotated dataset tailored for misogyny detection in the Swedish language. To address the cultural and linguistic specificity of misogyny in Swedish, we collaborated with experts from the social sciences and humanities. Our interdisciplinary team developed a rigorous annotation process, incorporating both domain knowledge and language expertise, to capture the nuances of misogyny in a Swedish context. This methodology ensures that the dataset is not only culturally relevant but also aligned with broader efforts in bias detection for low-resource languages. The dataset, along with the annotation guidelines, is publicly available for further research.
Related papers
- BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER, a collection of emotion-annotated datasets in 28 different languages.
We describe the data collection and annotation processes and the challenges of building these datasets.
We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - A multitask learning framework for leveraging subjectivity of annotators to identify misogyny [47.175010006458436]
We propose a multitask learning approach to enhance the performance of the misogyny identification systems.
We incorporated diverse perspectives from annotators in our model design, considering gender and age across six profile groups.
This research advances content moderation and highlights the importance of embracing diverse perspectives to build effective online moderation systems.
arXiv Detail & Related papers (2024-06-22T15:06:08Z) - Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology [22.458957168929487]
Existing research in measuring and mitigating gender bias predominantly centers on English.
This paper presents the first comprehensive study delving into the nuanced landscape of gender bias in Hindi.
arXiv Detail & Related papers (2024-05-10T09:26:12Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset [5.528106559459623]
The Biasly dataset is built in collaboration with multi-disciplinary experts and annotators themselves.
The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites.
arXiv Detail & Related papers (2023-11-15T23:27:19Z) - SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic
Specialisation for Chinese Sexism Detection in Social Media [23.246615034191553]
We develop a cross-lingual domain-aware semantic specialisation system for sexism detection.
We leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge.
Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations.
arXiv Detail & Related papers (2022-11-15T19:00:20Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language [0.0]
We introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny.
Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems.
arXiv Detail & Related papers (2021-03-18T12:01:13Z) - Mitigating Gender Bias in Machine Learning Data Sets [5.075506385456811]
Gender bias has been identified in the context of employment advertising and recruitment tools.
This paper proposes a framework for the identification of gender bias in training data for machine learning.
arXiv Detail & Related papers (2020-05-14T12:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.