Online hierarchical partitioning of the output space in extreme multi-label data stream
- URL: http://arxiv.org/abs/2507.20894v1
- Date: Mon, 28 Jul 2025 14:47:13 GMT
- Title: Online hierarchical partitioning of the output space in extreme multi-label data stream
- Authors: Lara Neves, Afonso Lourenço, Alberto Cano, Goreti Marreiros,
- Abstract summary: This work introduces iHOMER, an online multi-label learning framework that partitions the label space into disjoint clusters, correlated without relying on predefined hierarchies.<n>Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32%, establishing its robustness for online multi-label classification.
- Score: 2.474908349649168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.
Related papers
- Semi-Supervised Multi-Label Feature Selection with Consistent Sparse Graph Learning [13.401566810844368]
Existing multi-label methods fail to evaluate the label correlations without enough labeled samples.<n>The similarity graph structure directly derived from the original feature space is suboptimal for multi-label problems.<n>We propose a consistent sparse graph learning method for multi-label semi-supervised feature selection.
arXiv Detail & Related papers (2025-05-23T13:25:41Z) - Label Cluster Chains for Multi-Label Classification [2.072831155509228]
Multi-label classification is a type of supervised machine learning that can simultaneously assign multiple labels to an instance.
We propose a method to chain disjoint correlated label clusters obtained by applying a partition method in the label space.
Our proposal shows that learning and chaining disjoint correlated label clusters can better explore and learn label correlations.
arXiv Detail & Related papers (2024-11-01T11:16:37Z) - GLA-DA: Global-Local Alignment Domain Adaptation for Multivariate Time Series [37.736876308352954]
GLA-DA aims to preserve differences among data with distinct labels by aligning the samples with the same class labels together.
We implemented GLA-DA in both UDA and SSDA scenarios, showcasing its superiority over state-of-the-art methods.
arXiv Detail & Related papers (2024-10-09T08:27:26Z) - Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Scalable Label Distribution Learning for Multi-Label Classification [43.52928088881866]
Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels.
Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric.
Most existing methods design learning processes associated with the number of labels, which makes their computational complexity a bottleneck when scaling up to large-scale output space.
arXiv Detail & Related papers (2023-11-28T06:52:53Z) - Disambiguated Attention Embedding for Multi-Instance Partial-Label
Learning [68.56193228008466]
In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set.
Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels.
We propose an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning.
arXiv Detail & Related papers (2023-05-26T13:25:17Z) - FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with
Noisy Labels [49.47228898303909]
Federated learning (FL) aims at training a global model on the server side while the training data are collected and located at the local devices.
Local training on noisy labels can easily result in overfitting to noisy labels, which is devastating to the global model through aggregation.
We develop a simple two-level sampling method "FedNoiL" that selects clients for more robust global aggregation on the server.
arXiv Detail & Related papers (2022-05-20T12:06:39Z) - Evolving Multi-Label Fuzzy Classifier [5.53329677986653]
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time.
We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner.
arXiv Detail & Related papers (2022-03-29T08:01:03Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z) - Instance-Aware Graph Convolutional Network for Multi-Label
Classification [55.131166957803345]
Graph convolutional neural network (GCN) has effectively boosted the multi-label image recognition task.
We propose an instance-aware graph convolutional neural network (IA-GCN) framework for multi-label classification.
arXiv Detail & Related papers (2020-08-19T12:49:28Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.