Related papers: Sanitizing Manufacturing Dataset Labels Using Vision-Language Models

Sanitizing Manufacturing Dataset Labels Using Vision-Language Models

URL: http://arxiv.org/abs/2506.23465v1
Date: Mon, 30 Jun 2025 02:13:09 GMT
Title: Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
Authors: Nazanin Mahjourian, Vinh Nguyen,
Abstract summary: This paper introduces Vision-Language Sanitization and Refinement (VLSR), which is a vision-language-based framework for label sanitization and refinement.<n>The method embeds both images and their associated textual labels into a shared semantic space leveraging the CLIP vision-language model.<n> Experimental results demonstrate that the VLSR framework successfully identifies problematic labels and improves label consistency.
Score: 1.0819408603463427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of machine learning models in industrial applications is heavily dependent on the quality of the datasets used to train the models. However, large-scale datasets, specially those constructed from crowd-sourcing and web-scraping, often suffer from label noise, inconsistencies, and errors. This problem is particularly pronounced in manufacturing domains, where obtaining high-quality labels is costly and time-consuming. This paper introduces Vision-Language Sanitization and Refinement (VLSR), which is a vision-language-based framework for label sanitization and refinement in multi-label manufacturing image datasets. This method embeds both images and their associated textual labels into a shared semantic space leveraging the CLIP vision-language model. Then two key tasks are addressed in this process by computing the cosine similarity between embeddings. First, label sanitization is performed to identify irrelevant, misspelled, or semantically weak labels, and surface the most semantically aligned label for each image by comparing image-label pairs using cosine similarity between image and label embeddings. Second, the method applies density-based clustering on text embeddings, followed by iterative cluster merging, to group semantically similar labels into unified label groups. The Factorynet dataset, which includes noisy labels from both human annotations and web-scraped sources, is employed to evaluate the effectiveness of the proposed framework. Experimental results demonstrate that the VLSR framework successfully identifies problematic labels and improves label consistency. This method enables a significant reduction in label vocabulary through clustering, which ultimately enhances the dataset's quality for training robust machine learning models in industrial applications with minimal human intervention.

Related papers

Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation [0.0]
We propose a method to train a semantic segmentation model using images with annotated labels and pseudo labels.<n>The accuracy of the model depends on the quality of the pseudo labels and the amount of data with annotated labels.<n>The effectiveness of the proposed method is demonstrated through the experiments using the public datasets: PASCAL and MS COCO.
arXiv Detail & Related papers (2025-05-26T11:31:13Z)
When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification [11.49089004019603]
We present a comprehensive framework named REVEAL to address both noisy labels and missing labels in image classification test sets.<n> REVEAL detects potential noisy labels and omissions, aggregates predictions from various methods, and refines label accuracy through confidence-informed predictions and consensus-based filtering.<n>Our method effectively reveals missing labels from public datasets and provides soft-labeled results with likelihoods.
arXiv Detail & Related papers (2025-05-22T02:47:36Z)
Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning [8.387189407144403]
We motivate weakly supervised learning as an effective learning paradigm for problems where curating perfectly annotated datasets is expensive.<n>We focus on Partial Learning (PLL), a weakly-supervised learning paradigm where each training instance is paired with a set of candidate labels.<n>We present a framework that initially assigns pseudo-labels to images by exploiting the noisy partial labels through a weighted nearest neighbour algorithm.
arXiv Detail & Related papers (2024-02-07T13:32:47Z)
Description-Enhanced Label Embedding Contrastive Learning for Text Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task. Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets. external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z)
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation. We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z)
Group is better than individual: Exploiting Label Topologies and Label Relations for Joint Multiple Intent Detection and Slot Filling [39.76268402567324]
We construct a Heterogeneous Label Graph (HLG) containing two kinds of topologies. Label correlations are leveraged to enhance semantic-label interactions. We also propose the label-aware inter-dependent decoding mechanism to further exploit the label correlations for decoding.
arXiv Detail & Related papers (2022-10-19T08:21:43Z)
Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images. The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z)
Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework [28.898240725099782]
We build an entity recognition model requiring only a few shots of annotated document images. We develop a novel label-aware seq2seq framework, LASER. Experiments on two benchmark datasets demonstrate the superiority of LASER under the few-shot setting.
arXiv Detail & Related papers (2022-03-30T18:30:42Z)
Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels. Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z)
Structured Semantic Transfer for Multi-Label Recognition with Partial Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels. The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations. Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z)
Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network [61.94394163309688]
We propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.
arXiv Detail & Related papers (2020-06-10T07:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.