Related papers: A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild

A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild

URL: http://arxiv.org/abs/2506.10117v1
Date: Wed, 11 Jun 2025 18:55:54 GMT
Title: A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild
Authors: Klim Kireev, Ana-Maria Creţu, Raphael Meier, Sarah Adel Bargal, Elissa Redmiles, Carmela Troncoso,
Abstract summary: We release an image-caption dataset aimed at benchmarking tools that detect depictions of minors.<n>ICCWD contains 10,000 image-caption pairs manually labeled to indicate the presence or absence of a child in the image.<n>Our results suggest that child detection is a challenging task, with the best method achieving a 75.3% true positive rate.
Score: 12.25468403574749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Platforms and the law regulate digital content depicting minors (defined as individuals under 18 years of age) differently from other types of content. Given the sheer amount of content that needs to be assessed, machine learning-based automation tools are commonly used to detect content depicting minors. To our knowledge, no dataset or benchmark currently exists for detecting these identification methods in a multi-modal environment. To fill this gap, we release the Image-Caption Children in the Wild Dataset (ICCWD), an image-caption dataset aimed at benchmarking tools that detect depictions of minors. Our dataset is richer than previous child image datasets, containing images of children in a variety of contexts, including fictional depictions and partially visible bodies. ICCWD contains 10,000 image-caption pairs manually labeled to indicate the presence or absence of a child in the image. To demonstrate the possible utility of our dataset, we use it to benchmark three different detectors, including a commercial age estimation system applied to images. Our results suggest that child detection is a challenging task, with the best method achieving a 75.3% true positive rate. We hope the release of our dataset will aid in the design of better minor detection methods in a wide range of scenarios.

Related papers

Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability [6.366871989491978]
Including children's images in datasets has raised ethical concerns.<n>These datasets can expose children to risks such as exploitation, profiling, and tracking.<n>We propose a pipeline to detect and remove such images.
arXiv Detail & Related papers (2025-04-20T01:36:07Z)
Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size Comparison [5.480305055542485]
We present a method for curating large-scale image datasets of invertebrates.<n>Our approach is based on extracting feature embeddings with pretrained deep neural networks.<n>Also, we show that a simple area-based size comparison approach is able to find a lot of common erroneous images.
arXiv Detail & Related papers (2024-12-20T12:35:41Z)
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context. We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z)
Content Bias in Deep Learning Image Age Approximation: A new Approach Towards better Explainability [4.088355251010862]
In temporal image forensics, content bias can be exploited by a neural network. A novel approach is proposed that evaluates the influence of image content. It is shown that a deep learning approach proposed in the context of age classification is most likely highly dependent on the image content.
arXiv Detail & Related papers (2023-10-03T14:09:27Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image. We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z)
Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z)
AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures. Our experiments demonstrate that the method is able to represent the image in low dimensional space. Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z)
Data Augmentation for Object Detection via Differentiable Neural Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce. Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data. We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z)
Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations. We present a method that uses the attributes in this "textual scene graph" to train object detectors. We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z)
A Method for Curation of Web-Scraped Face Image Datasets [13.893682217746816]
A variety of issues occur when collecting a dataset in-the-wild. With the number of images being in the millions, a manual cleaning procedure is not feasible. We propose a semi-automated method, where the goal is to have a clean dataset for testing face recognition methods.
arXiv Detail & Related papers (2020-04-07T01:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.