Related papers: Exploring space efficiency in a tree-based linear model for extreme multi-label classification

Exploring space efficiency in a tree-based linear model for extreme multi-label classification

URL: http://arxiv.org/abs/2410.09554v1
Date: Sat, 12 Oct 2024 15:02:40 GMT
Title: Exploring space efficiency in a tree-based linear model for extreme multi-label classification
Authors: He-Zhe Lin, Cheng-Hung Liu, Chih-Jen Lin,
Abstract summary: Extreme multi-label classification (XMC) aims to identify relevant subsets from numerous labels. Among the various approaches for XMC, tree-based linear models are effective due to their superior efficiency and simplicity. In this work, we conduct both theoretical and empirical analyses on the space to store a tree model under the assumption of sparse data.
Score: 11.18858602369985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extreme multi-label classification (XMC) aims to identify relevant subsets from numerous labels. Among the various approaches for XMC, tree-based linear models are effective due to their superior efficiency and simplicity. However, the space complexity of tree-based methods is not well-studied. Many past works assume that storing the model is not affordable and apply techniques such as pruning to save space, which may lead to performance loss. In this work, we conduct both theoretical and empirical analyses on the space to store a tree model under the assumption of sparse data, a condition frequently met in text data. We found that, some features may be unused when training binary classifiers in a tree method, resulting in zero values in the weight vectors. Hence, storing only non-zero elements can greatly save space. Our experimental results indicate that tree models can achieve up to a 95% reduction in storage space compared to the standard one-vs-rest method for multi-label text classification. Our research provides a simple procedure to estimate the size of a tree model before training any classifier in the tree nodes. Then, if the model size is already acceptable, this approach can help avoid modifying the model through weight pruning or other techniques.

Related papers

Experiments with Optimal Model Trees [2.8391355909797644]
We show that globally optimal model trees can achieve competitive accuracy with very small trees. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines.
arXiv Detail & Related papers (2025-03-17T08:03:47Z)
Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language. We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
Discrete Tree Flows via Tree-Structured Permutations [5.929956715430168]
discrete flow-based models cannot be straightforwardly optimized with conventional deep learning methods because gradients of discrete functions are undefined or zero. Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees.
arXiv Detail & Related papers (2022-07-04T23:11:04Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones. Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Robust Optimal Classification Trees under Noisy Labels [1.5039745292757671]
We propose a novel methodology to construct Optimal Classification Trees that takes into account that noisy labels may occur in the training sample. Our approach rests on two main elements: (1) the splitting rules for the classification trees are designed to maximize the separation margin between classes applying the paradigm of SVM; and (2) some of the labels of the training sample are allowed to be changed during the construction of the tree trying to detect the label noise.
arXiv Detail & Related papers (2020-12-15T19:12:29Z)
Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation [75.93960390191262]
We exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes. We propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models.
arXiv Detail & Related papers (2020-08-13T03:52:37Z)
Oblique Predictive Clustering Trees [6.317966126631351]
Predictive clustering trees (PCTs) can be used to solve a variety of predictive modeling tasks, including structured output prediction. We propose oblique predictive clustering trees, capable of addressing these limitations. We experimentally evaluate the proposed methods on 60 benchmark datasets for 6 predictive modeling tasks.
arXiv Detail & Related papers (2020-07-27T14:58:23Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.