Related papers: Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement

Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement

URL: http://arxiv.org/abs/2401.17736v1
Date: Wed, 31 Jan 2024 10:57:07 GMT
Title: Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement
Authors: Esla Timothy Anzaku (1,2,3), Hyesoo Hong (1), Jin-Woo Park (1), Wonjun Yang (1), Kangmin Kim (1), JongBum Won (1), Deshika Vinoshani Kumari Herath (6), Arnout Van Messem (5) and Wesley De Neve (1,2,3)
Abstract summary: Large-scale datasets for single-label multi-class classification, such as emphImageNet-1k, have been instrumental in advancing deep learning and computer vision. We introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately $47.88%$ of the images contained at least two labels.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale datasets for single-label multi-class classification, such as \emph{ImageNet-1k}, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. We term this novel framework \emph{Multilabelfy}. Central to Multilabelfy is an adaptable web-based platform that systematically guides annotators through the re-evaluation process, effectively leveraging human-machine interactions to enhance dataset quality. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately $47.88\%$ of the images contained at least two labels, underscoring the need for more rigorous assessments of such influential datasets. Furthermore, our analysis showed a negative correlation between the number of potential labels per image and model top-1 accuracy, illuminating a crucial factor in model evaluation and selection. Our open-source framework, Multilabelfy, offers a convenient, lightweight solution for dataset enhancement, emphasizing multi-label proportions. This study tackles major challenges in dataset integrity and provides key insights into model performance evaluation. Moreover, it underscores the advantages of integrating human expertise with machine capabilities to produce more robust models and trustworthy data development. The source code for Multilabelfy will be available at https://github.com/esla/Multilabelfy. \keywords{Computer Vision \and Dataset Quality Enhancement \and Dataset Validation \and Human-Computer Interaction \and Multi-label Annotation.}

Related papers

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents [62.616106562146776]
We propose a textbfVisual-Centric textbfSelection approach via textbfAgents Collaboration (ViSA) Our approach consists of 1) an image information quantification method via visual agents collaboration to select images with rich visual information, and 2) a visual-centric instruction quality assessment method to select high-quality instruction data related to high-quality images.
arXiv Detail & Related papers (2025-02-27T09:37:30Z)
Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry. Applying contrastive learning to multi-label classification presents unique challenges. We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection [15.264492633817774]
We address the real-world, challenging task of out-of-context misinformation detection, where a real image is paired with an incorrect caption for creating fake news. Existing approaches for this task assume the availability of large amounts of labeled data, which is often impractical in real-world. We propose a semi-supervised protocol, where the model has access to a limited number of labeled image-text pairs and a large corpus of unlabeled pairs.
arXiv Detail & Related papers (2024-10-06T09:45:20Z)
SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images. Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights. We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z)
A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden. We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z)
Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data [44.64190826937705]
We present scalable algorithms for detecting label errors and outlier data based on the relational graph structure of data. We also introduce a visualization tool that provides contextual information of a data point in the feature-embedded space. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in large-scale real-world datasets.
arXiv Detail & Related papers (2023-01-29T02:09:13Z)
Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks. It captures discrete variables in the data alongside the weak supervision derived label estimate. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z)
Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK) MAK follows three simple principles: tailness, proximity, and diversity. We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z)
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. We propose modifications and best practices aimed at minimizing human labeling effort. Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models [6.278267504352446]
We advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data. We show how label aggregation can benefit from estimating the annotators' confusion matrix to improve the learning process. We demonstrate the effectiveness of our strategies on several image datasets, using SVM and deep neural networks.
arXiv Detail & Related papers (2020-11-13T09:48:30Z)
Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier. An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.