Related papers: Uncovering Feature Interdependencies in High-Noise Environments with Stepwise Lookahead Decision Forests

Uncovering Feature Interdependencies in High-Noise Environments with Stepwise Lookahead Decision Forests

URL: http://arxiv.org/abs/2009.14572v5
Date: Wed, 31 Mar 2021 14:24:26 GMT
Title: Uncovering Feature Interdependencies in High-Noise Environments with Stepwise Lookahead Decision Forests
Authors: Delilah Donick and Sandro Claudio Lera
Abstract summary: "Stepwise lookahead" variation of random forest algorithm is presented for its ability to better uncover binary feature interdependencies. A long-short trading strategy for copper futures is then backtested by training both greedy and lookahead random forests to predict the signs of daily price returns.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. The sub-optimality of greedy implementation has been well-known, yet mainstream adoption of more sophisticated tree building algorithms has been lacking. We examine under what circumstances an implementation of less greedy decision trees actually yields outperformance. To this end, a "stepwise lookahead" variation of the random forest algorithm is presented for its ability to better uncover binary feature interdependencies. In contrast to the greedy approach, the decision trees included in this random forest algorithm, each simultaneously consider three split nodes in tiers of depth two. It is demonstrated on synthetic data and financial price time series that the lookahead version significantly outperforms the greedy one when (a) certain non-linear relationships between feature-pairs are present and (b) if the signal-to-noise ratio is particularly low. A long-short trading strategy for copper futures is then backtested by training both greedy and stepwise lookahead random forests to predict the signs of daily price returns. The resulting superior performance of the lookahead algorithm is at least partially explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. More generally, across all examined datasets, when no such relationships between features are present, performance across random forests is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this lookahead variation is a useful extension to the toolkit of data scientists, in particular for financial machine learning, where conditions (a) and (b) are typically met.

Related papers

Decision Tree Induction Through LLMs via Semantically-Aware Evolution [53.0367886783772]
We propose an evolutionary optimization method for decision tree induction based on genetic programming (GP) Our key innovation is the integration of semantic priors and domain-specific knowledge about the search space into the algorithm. This is operationalized through novel genetic operators that work with structured natural language prompts.
arXiv Detail & Related papers (2025-03-18T12:52:03Z)
Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Soft Hoeffding Tree: A Transparent and Differentiable Model on Data Streams [2.6524539020042663]
Stream mining algorithms such as Hoeffding trees grow based on the incoming data stream. We propose soft Hoeffding trees (SoHoT) as a new differentiable and transparent model for possibly infinite and changing data streams.
arXiv Detail & Related papers (2024-11-07T15:49:53Z)
Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees. We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z)
Lookback for Learning to Branch [77.32867454769936]
Bipartite Graph Neural Networks (GNNs) have been shown to be an important component of deep learning based Mixed-Integer Linear Program (MILP) solvers. Recent works have demonstrated the effectiveness of such GNNs in replacing the branching (variable selection) in branch-and-bound (B&B) solvers.
arXiv Detail & Related papers (2022-06-30T02:33:32Z)
Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task. A path in the tree from the root to leaf represents an individual possible future trajectory. Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z)
Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure. HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
Data-driven advice for interpreting local and global model predictions in bioinformatics problems [17.685881417954782]
Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction. We compare the explanations computed by both methods on a set of 164 publicly available classification problems. For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
arXiv Detail & Related papers (2021-08-13T12:41:39Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
Estimation and Inference with Trees and Forests in High Dimensions [23.732259124656903]
shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the ambient dimension $d$. For strongly relevant features, we also show that fully grown forests achieve fast MSE rates and their predictions are also honestally normal.
arXiv Detail & Related papers (2020-07-07T05:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.