Related papers: Large Language Models Do Multi-Label Classification Differently

Large Language Models Do Multi-Label Classification Differently

URL: http://arxiv.org/abs/2505.17510v2
Date: Mon, 22 Sep 2025 20:30:41 GMT
Title: Large Language Models Do Multi-Label Classification Differently
Authors: Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, Shrikanth Narayanan,
Abstract summary: Multi-label classification is prevalent in real-world settings, but the behavior of Large Language Models (LLMs) in this setting is understudied.<n>We investigate how autoregressive LLMs perform multi-label classification, focusing on subjective tasks.<n>We find that the initial probability distribution for the first label often does not reflect the eventual final output.
Score: 41.60681320369492
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-label classification is prevalent in real-world settings, but the behavior of Large Language Models (LLMs) in this setting is understudied. We investigate how autoregressive LLMs perform multi-label classification, focusing on subjective tasks, by analyzing the output distributions of the models at each label generation step. We find that the initial probability distribution for the first label often does not reflect the eventual final output, even in terms of relative order and find LLMs tend to suppress all but one label at each generation step. We further observe that as model scale increases, their token distributions exhibit lower entropy and higher single-label confidence, but the internal relative ranking of the labels improves. Finetuning methods such as supervised finetuning and reinforcement learning amplify this phenomenon. We introduce the task of distribution alignment for multi-label settings: aligning LLM-derived label distributions with empirical distributions estimated from annotator responses in subjective tasks. We propose both zero-shot and supervised methods which improve both alignment and predictive performance over existing approaches. We find one method -- taking the max probability over all label generation distributions instead of just using the initial probability distribution -- improves both distribution alignment and overall F1 classification without adding any additional computation.

Related papers

Learning Semantic-Aware Threshold for Multi-Label Image Recognition with Partial Labels [12.477433449244543]
Multi-label image recognition with partial labels (MLR-PL) is designed to train models using a mix of known and unknown labels.<n>Traditional methods rely on semantic or feature correlations to create pseudo-labels for unidentified labels.<n>In our study, we introduce the Semantic-Aware Threshold Learning (SATL) algorithm.
arXiv Detail & Related papers (2025-07-31T05:54:10Z)
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning [12.052388861361937]
Recent studies have overemphasized co-occurrence relationships among labels, leading to suboptimal models.<n>We propose the Multi-Label Visual Prompt Tuning framework to balance correlative and discriminative relationships among labels.<n>Our proposed approach achieves competitive results and outperforms SOTA methods on multiple pre-trained models.
arXiv Detail & Related papers (2025-04-14T08:52:50Z)
Distribution-Consistency-Guided Multi-modal Hashing [24.945074615208]
We propose a novel Distribution-Consistency-Guided Multi-modal Hashing (DCGMH) to enhance retrieval performance.<n>The proposed method first randomly initializes several category centers, which are used to compute the high-low distribution of similarity scores.<n>Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-15T15:13:14Z)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision [75.1860418333995]
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources as labeling functions. Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process.
arXiv Detail & Related papers (2022-10-06T07:28:53Z)
A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation [42.0958430465578]
We study the partial multi-label (PML) image classification problem. Existing PML methods typically design a disambiguation strategy to filter out noisy labels. We propose a deep model for PML to enhance the representation and discrimination ability.
arXiv Detail & Related papers (2022-07-06T02:49:02Z)
Self-Adaptive Label Augmentation for Semi-supervised Few-shot Classification [121.63992191386502]
Few-shot classification aims to learn a model that can generalize well to new tasks when only a few labeled samples are available. We propose a semi-supervised few-shot classification method that assigns an appropriate label to each unlabeled sample by a manually defined metric. A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-16T13:14:03Z)
Evolving Multi-Label Fuzzy Classifier [5.53329677986653]
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time. We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner.
arXiv Detail & Related papers (2022-03-29T08:01:03Z)
Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance. We propose a general pseudo-labeling framework to address the bias motivated by this observation. We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z)
PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes. We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training. Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z)
Capturing Label Distribution: A Case Study in NLI [19.869498599986006]
Post-hoc smoothing of the predicted label distribution to match the expected label entropy is very effective. We introduce a small amount of examples with multiple references into training.
arXiv Detail & Related papers (2021-02-13T04:14:31Z)
Probabilistic Decoupling of Labels in Classification [4.865747672937677]
We develop a principled, probabilistic, unified approach to non-standard classification tasks. We train a classifier on the given labels to predict the label-distribution. We then infer the underlying class-distributions by variationally optimizing a model of label-class transitions.
arXiv Detail & Related papers (2020-06-16T10:07:50Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.