Related papers: Modeling Text with Decision Forests using Categorical-Set Splits

Modeling Text with Decision Forests using Categorical-Set Splits

URL: http://arxiv.org/abs/2009.09991v3
Date: Fri, 5 Feb 2021 11:44:02 GMT
Title: Modeling Text with Decision Forests using Categorical-Set Splits
Authors: Mathieu Guillame-Bert, Sebastian Bruch, Petr Mitrichev, Petr Mikheev, Jan Pfeifer
Abstract summary: In axis-aligned decision forests, the "decision" to route an input example is the result of the evaluation of a condition on a single dimension in the feature space. We define a condition that is specific to categorical-set features and present an algorithm to learn it. Our algorithm is efficient during training and the resulting conditions are fast to evaluate with our extension of the QuickScorer inference algorithm.
Score: 2.434796198711328
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision forest algorithms typically model data by learning a binary tree structure recursively where every node splits the feature space into two sub-regions, sending examples into the left or right branch as a result. In axis-aligned decision forests, the "decision" to route an input example is the result of the evaluation of a condition on a single dimension in the feature space. Such conditions are learned using efficient, often greedy algorithms that optimize a local loss function. For example, a node's condition may be a threshold function applied to a numerical feature, and its parameter may be learned by sweeping over the set of values available at that node and choosing a threshold that maximizes some measure of purity. Crucially, whether an algorithm exists to learn and evaluate conditions for a feature type determines whether a decision forest algorithm can model that feature type at all. For example, decision forests today cannot consume textual features directly -- such features must be transformed to summary statistics instead. In this work, we set out to bridge that gap. We define a condition that is specific to categorical-set features -- defined as an unordered set of categorical variables -- and present an algorithm to learn it, thereby equipping decision forests with the ability to directly model text, albeit without preserving sequential order. Our algorithm is efficient during training and the resulting conditions are fast to evaluate with our extension of the QuickScorer inference algorithm. Experiments on benchmark text classification datasets demonstrate the utility and effectiveness of our proposal.

Related papers

Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case [0.0]
This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated.
arXiv Detail & Related papers (2024-07-26T06:50:17Z)
Simplification of Forest Classifiers and Regressors [1.8275108630751844]
We study the problem of sharing as many branching conditions of a given forest classifier or regressor as possible. We propose an algorithm for the original problem using an algorithm solving this problem efficiently. The effectiveness of our method is demonstrated through comprehensive experiments using 21 datasets.
arXiv Detail & Related papers (2022-12-14T08:49:02Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Explaining random forest prediction through diverse rulesets [0.0]
Local Tree eXtractor (LTreeX) is able to explain the forest prediction for a given test instance with a few diverse rules. We show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
arXiv Detail & Related papers (2022-03-29T12:54:57Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Nonparametric Feature Selection by Random Forests and Deep Neural Networks [4.232614032390374]
We propose a nonparametric feature selection algorithm that incorporates random forests and deep neural networks. Although the algorithm is proposed using standard random forests, it can be widely adapted to other machine learning algorithms.
arXiv Detail & Related papers (2022-01-18T08:45:33Z)
Making CNNs Interpretable by Building Dynamic Sequential Decision Forests with Top-down Hierarchy Learning [62.82046926149371]
We propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable. We achieve this by building a differentiable decision forest on top of CNNs. We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF)
arXiv Detail & Related papers (2021-06-05T07:41:18Z)
E2E-FS: An End-to-End Feature Selection Method for Neural Networks [0.3222802562733786]
We present a novel selection algorithm, called EndtoEnd Feature Selection (E2FS) Our algorithm, similar to the lasso approach, is solved with gradient descent techniques. Although hard restrictions, experimental results show that this algorithm can be used with any learning model.
arXiv Detail & Related papers (2020-12-14T16:19:25Z)
Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.