Related papers: Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration

Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration

URL: http://arxiv.org/abs/2501.12332v1
Date: Tue, 21 Jan 2025 18:06:54 GMT
Title: Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration
Authors: Thomas Walshe, Sae Young Moon, Chunyang Xiao, Yawwani Gunawardana, Fran Silavong,
Abstract summary: We explore effectively leveraging open-source models for automatic labelling.<n>We propose Retrieval Augmented Classification (RAC) for which LLM performs inferences for one label at a time.<n>We show that our method, which dynamically integrates label description, leads to performance improvements in labelling tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acquiring labelled training data remains a costly task in real world machine learning projects to meet quantity and quality requirements. Recently Large Language Models (LLMs), notably GPT-4, have shown great promises in labelling data with high accuracy. However, privacy and cost concerns prevent the ubiquitous use of GPT-4. In this work, we explore effectively leveraging open-source models for automatic labelling. We identify integrating label schema as a promising technology but found that naively using the label description for classification leads to poor performance on high cardinality tasks. To address this, we propose Retrieval Augmented Classification (RAC) for which LLM performs inferences for one label at a time using corresponding label schema; we start with the most related label and iterates until a label is chosen by the LLM. We show that our method, which dynamically integrates label description, leads to performance improvements in labelling tasks. We further show that by focusing only on the most promising labels, RAC can trade off between label quality and coverage - a property we leverage to automatically label our internal datasets.

Related papers

Mixed Blessing: Class-Wise Embedding guided Instance-Dependent Partial Label Learning [53.64180787439527]
In partial label learning (PLL), every sample is associated with a candidate label set comprising the ground-truth label and several noisy labels. For the first time, we create class-wise embeddings for each sample, which allow us to explore the relationship of instance-dependent noisy labels. To reduce the high label ambiguity, we introduce the concept of class prototypes containing global feature information.
arXiv Detail & Related papers (2024-12-06T13:25:39Z)
Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification [11.19022605804112]
This paper introduces RR2QC, a novel Retrieval Reranking method To multi-label Question Classification. It uses label semantics and meta-label refinement to enhance personalized learning and resource recommendation. Experimental results demonstrate that RR2QC outperforms existing classification methods in Precision@k and F1 scores.
arXiv Detail & Related papers (2024-11-04T06:27:14Z)
Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning [61.00359941983515]
Multi-instance partial-label learning (MIPL) addresses scenarios where each training sample is represented as a multi-instance bag associated with a candidate label set containing one true label and several false positives. ELIMIPL exploits the conjugate label information to improve the disambiguation performance.
arXiv Detail & Related papers (2024-08-26T15:49:31Z)
Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations. We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z)
Contrastive Label Enhancement [13.628665406039609]
We propose Contrastive Label Enhancement (ConLE) to generate high-level features by contrastive learning strategy. We leverage the obtained high-level features to gain label distributions through a welldesigned training strategy.
arXiv Detail & Related papers (2023-05-16T14:53:07Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts. Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z)
Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks. We then tailor the labeling model specifically to the task of entity matching. We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z)
Eliciting and Learning with Soft Labels from Every Annotator [31.10635260890126]
We focus on efficiently eliciting soft labels from individual annotators. We demonstrate that learning with our labels achieves comparable model performance to prior approaches.
arXiv Detail & Related papers (2022-07-02T12:03:00Z)
Group-aware Label Transfer for Domain Adaptive Person Re-identification [179.816105255584]
Unsupervised Adaptive Domain (UDA) person re-identification (ReID) aims at adapting the model trained on a labeled source-domain dataset to a target-domain dataset without any further annotations. Most successful UDA-ReID approaches combine clustering-based pseudo-label prediction with representation learning and perform the two steps in an alternating fashion. We propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
arXiv Detail & Related papers (2021-03-23T07:57:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.