Creating a Multimodal Dataset of Images and Text to Study Abusive
Language
- URL: http://arxiv.org/abs/2005.02235v1
- Date: Tue, 5 May 2020 14:31:47 GMT
- Title: Creating a Multimodal Dataset of Images and Text to Study Abusive
Language
- Authors: Alessio Palmero Aprosio, Stefano Menini, Sara Tonelli
- Abstract summary: CREENDER is an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments.
The corpus, with Italian comments, has been analysed from different perspectives to investigate whether the subject of the images plays a role in triggering a comment.
We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.
- Score: 2.2688530041645856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to study online hate speech, the availability of datasets containing
the linguistic phenomena of interest are of crucial importance. However, when
it comes to specific target groups, for example teenagers, collecting such data
may be problematic due to issues with consent and privacy restrictions.
Furthermore, while text-only datasets of this kind have been widely used,
limitations set by image-based social media platforms like Instagram make it
difficult for researchers to experiment with multimodal hate speech data. We
therefore developed CREENDER, an annotation tool that has been used in school
classes to create a multimodal dataset of images and abusive comments, which we
make freely available under Apache 2.0 license. The corpus, with Italian
comments, has been analysed from different perspectives, to investigate whether
the subject of the images plays a role in triggering a comment. We find that
users judge the same images in different ways, although the presence of a
person in the picture increases the probability to get an offensive comment.
Related papers
- Vision-Language Models under Cultural and Inclusive Considerations [53.614528867159706]
Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives.
Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case.
We create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind.
We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting.
arXiv Detail & Related papers (2024-07-08T17:50:00Z) - OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text [112.60163342249682]
We introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.
Our dataset has 15 times larger scales while maintaining good data quality.
We hope this could provide a solid data foundation for future multimodal model research.
arXiv Detail & Related papers (2024-06-12T17:01:04Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant.
We build three pipelines comprising state-of-the-art generative models to do the task.
We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z) - C-CLIP: Contrastive Image-Text Encoders to Close the
Descriptive-Commentative Gap [0.5439020425819]
The interplay between the image and comment on a social media post is one of high importance for understanding its overall message.
Recent strides in multimodal embedding models, namely CLIP, have provided an avenue forward in relating image and text.
The current training regime for CLIP models is insufficient for matching content found on social media, regardless of site or language.
We show that training contrastive image-text encoders on explicitly commentative pairs results in large improvements in retrieval results.
arXiv Detail & Related papers (2023-09-06T19:03:49Z) - Improving Multimodal Datasets with Image Captioning [65.74736570293622]
We study how generated captions can increase the utility of web-scraped datapoints with nondescript text.
Our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text.
arXiv Detail & Related papers (2023-07-19T17:47:12Z) - Uncurated Image-Text Datasets: Shedding Light on Demographic Bias [21.421722941901123]
Even small but manually annotated datasets, such as MSCOCO, are affected by societal bias.
Our first contribution is to annotate part of the Google Conceptual Captions dataset, widely used for training vision-and-language models.
Second contribution is to conduct a comprehensive analysis of the annotations, focusing on how different demographic groups are represented.
Third contribution is to evaluate three prevailing vision-and-language tasks, showing that societal bias is a persistent problem in all of them.
arXiv Detail & Related papers (2023-04-06T02:33:51Z) - Borrowing Human Senses: Comment-Aware Self-Training for Social Media
Multimodal Classification [5.960550152906609]
We capture hinting features from user comments, which are retrieved via jointly leveraging visual and lingual similarity.
The classification tasks are explored via self-training in a teacher-student framework, motivated by the usually limited labeled data scales.
The results show that our method further advances the performance of previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-27T08:59:55Z) - Assessing the impact of contextual information in hate speech detection [0.48369513656026514]
We provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter.
This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.
arXiv Detail & Related papers (2022-10-02T09:04:47Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Transfer Learning for Hate Speech Detection in Social Media [14.759208309842178]
This paper uses a transfer learning technique to leverage two independent datasets jointly.
We build an interpretable two-dimensional visualization tool of the constructed hate speech representation -- dubbed the Map of Hate.
We show that the joint representation boosts prediction performances when only a limited amount of supervision is available.
arXiv Detail & Related papers (2019-06-10T08:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.