Multilabel Classification by Hierarchical Partitioning and
Data-dependent Grouping
- URL: http://arxiv.org/abs/2006.14084v2
- Date: Sat, 31 Oct 2020 19:11:10 GMT
- Title: Multilabel Classification by Hierarchical Partitioning and
Data-dependent Grouping
- Authors: Shashanka Ubaru, Sanjeeb Dash, Arya Mazumdar, Oktay Gunluk
- Abstract summary: We exploit the sparsity of label vectors and the hierarchical structure to embed them in low-dimensional space.
We present a novel data-dependent grouping approach, where we use a group construction based on a low-rank Nonnegative Matrix Factorization.
We then present a hierarchical partitioning approach that exploits the label hierarchy in large scale problems to divide up the large label space and create smaller sub-problems.
- Score: 33.48217977134427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern multilabel classification problems, each data instance belongs to a
small number of classes from a large set of classes. In other words, these
problems involve learning very sparse binary label vectors. Moreover, in
large-scale problems, the labels typically have certain (unknown) hierarchy. In
this paper we exploit the sparsity of label vectors and the hierarchical
structure to embed them in low-dimensional space using label groupings.
Consequently, we solve the classification problem in a much lower dimensional
space and then obtain labels in the original space using an appropriately
defined lifting. Our method builds on the work of (Ubaru & Mazumdar, 2017),
where the idea of group testing was also explored for multilabel
classification. We first present a novel data-dependent grouping approach,
where we use a group construction based on a low-rank Nonnegative Matrix
Factorization (NMF) of the label matrix of training instances. The construction
also allows us, using recent results, to develop a fast prediction algorithm
that has a logarithmic runtime in the number of labels. We then present a
hierarchical partitioning approach that exploits the label hierarchy in large
scale problems to divide up the large label space and create smaller
sub-problems, which can then be solved independently via the grouping approach.
Numerical results on many benchmark datasets illustrate that, compared to other
popular methods, our proposed methods achieve competitive accuracy with
significantly lower computational costs.
Related papers
- Semi-Supervised Hierarchical Multi-Label Classifier Based on Local Information [1.6574413179773761]
Semi-supervised hierarchical multi-label classifier based on local information (SSHMC-BLI)
SSHMC-BLI builds pseudo-labels for each unlabeled instance from the paths of labels of its labeled neighbors.
Experiments on 12 challenging datasets from functional genomics show that making use of unlabeled along with labeled data can help to improve the performance of a supervised hierarchical classifier trained only on labeled data.
arXiv Detail & Related papers (2024-04-30T20:16:40Z) - Active Generalized Category Discovery [60.69060965936214]
Generalized Category Discovery (GCD) endeavors to cluster unlabeled samples from both novel and old classes.
We take the spirit of active learning and propose a new setting called Active Generalized Category Discovery (AGCD)
Our method achieves state-of-the-art performance on both generic and fine-grained datasets.
arXiv Detail & Related papers (2024-03-07T07:12:24Z) - Towards Imbalanced Large Scale Multi-label Classification with Partially
Annotated Labels [8.977819892091]
Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes.
In this work, we address the issue of label imbalance and investigate how to train neural networks using partial labels.
arXiv Detail & Related papers (2023-07-31T21:50:48Z) - Making Binary Classification from Multiple Unlabeled Datasets Almost
Free of Supervision [128.6645627461981]
We propose a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors.
In MU-OPPO, we do not need the class priors for all unlabeled datasets.
We show that our framework brings smaller estimation errors of class priors and better performance of binary classification.
arXiv Detail & Related papers (2023-06-12T11:33:46Z) - Adopting the Multi-answer Questioning Task with an Auxiliary Metric for
Extreme Multi-label Text Classification Utilizing the Label Hierarchy [10.87653109398961]
This paper adopts the multi-answer questioning task for extreme multi-label classification.
This study adopts the proposed method and the evaluation metric to the legal domain.
arXiv Detail & Related papers (2023-03-02T08:40:31Z) - Complementary to Multiple Labels: A Correlation-Aware Correction
Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases.
We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z) - Review of Extreme Multilabel Classification [1.888738346075831]
Extreme multilabel classification or XML, is an active area of interest in machine learning.
The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.
arXiv Detail & Related papers (2023-02-12T18:29:20Z) - Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact
Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels.
In this paper, we formalize such problems as multi-instance partial-label learning (MIPL)
Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Label Disentanglement in Partition-based Extreme Multilabel
Classification [111.25321342479491]
We show that the label assignment problem in partition-based XMC can be formulated as an optimization problem.
We show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.
arXiv Detail & Related papers (2021-06-24T03:24:18Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.