FastForest: Increasing Random Forest Processing Speed While Maintaining
Accuracy
- URL: http://arxiv.org/abs/2004.02423v1
- Date: Mon, 6 Apr 2020 06:37:03 GMT
- Title: FastForest: Increasing Random Forest Processing Speed While Maintaining
Accuracy
- Authors: Darren Yates and Md Zahidul Islam
- Abstract summary: Our proposed FastForest algorithm delivers an average 24% increase in processing speed compared with Random Forest.
It maintains (and frequently exceeding) it on classification accuracy over tests involving 45 datasets.
detailed testing of Subbagging sizes has found an optimal scalar delivering a positive mix of processing performance and accuracy.
- Score: 2.6118176084782836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random Forest remains one of Data Mining's most enduring ensemble algorithms,
achieving well-documented levels of accuracy and processing speed, as well as
regularly appearing in new research. However, with data mining now reaching the
domain of hardware-constrained devices such as smartphones and Internet of
Things (IoT) devices, there is continued need for further research into
algorithm efficiency to deliver greater processing speed without sacrificing
accuracy. Our proposed FastForest algorithm delivers an average 24% increase in
processing speed compared with Random Forest whilst maintaining (and frequently
exceeding) it on classification accuracy over tests involving 45 datasets.
FastForest achieves this result through a combination of three optimising
components - Subsample Aggregating ('Subbagging'), Logarithmic Split-Point
Sampling and Dynamic Restricted Subspacing. Moreover, detailed testing of
Subbagging sizes has found an optimal scalar delivering a positive mix of
processing performance and accuracy.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Optimized Speculative Sampling for GPU Hardware Accelerators [14.681982904792763]
We optimize speculative sampling for parallel hardware accelerators to improve sampling speed.
We distribute the workload across multiple GPU threads, enabling simultaneous operations on matrix segments within thread blocks.
We conduct extensive experiments on both automatic speech recognition and summarization tasks to validate our methods.
arXiv Detail & Related papers (2024-06-16T17:19:23Z) - Edge-Enabled Real-time Railway Track Segmentation [0.0]
We propose an edge-enabled real-time railway track segmentation algorithm.
It is optimized to be suitable for edge applications by optimizing the network structure and quantizing the model after training.
Experimental results demonstrate that our enhanced algorithm achieves an accuracy level of 83.3%.
arXiv Detail & Related papers (2024-01-21T13:45:52Z) - Improved Sparse Ising Optimization [0.0]
This report presents new data demonstrating significantly higher performance on some longstanding benchmark problems with up to 20,000 variables.
Relative to leading reported combinations of speed and accuracy, a proof-of-concept implementation reached targets 2-4 orders of magnitude faster.
The data suggest exciting possibilities for pushing the sparse Ising performance frontier to potentially strengthen algorithm portfolios, AI toolkits and decision-making systems.
arXiv Detail & Related papers (2023-11-15T17:59:06Z) - Fast Bayesian Optimization of Needle-in-a-Haystack Problems using
Zooming Memory-Based Initialization [73.96101108943986]
A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset.
We present a Zooming Memory-Based Initialization algorithm that builds on conventional Bayesian optimization principles.
arXiv Detail & Related papers (2022-08-26T23:57:41Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - SwiftLane: Towards Fast and Efficient Lane Detection [0.8972186395640678]
We propose SwiftLane: a light-weight, end-to-end deep learning based framework, coupled with the row-wise classification formulation for fast and efficient lane detection.
Our method achieves an inference speed of 411 frames per second, surpassing state-of-the-art in terms of speed while achieving comparable results in terms of accuracy on the popular CULane benchmark dataset.
arXiv Detail & Related papers (2021-10-22T13:35:05Z) - GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and
Hardware [8.15489210461058]
We propose a unified programming model for mainstream sampling algorithms, termed GNNSampler.
We explore the data locality among nodes and their neighbors in real-world datasets for alleviating the irregular memory access in sampling.
Our method is universal to mainstream sampling algorithms and reduces the training time of GNN.
arXiv Detail & Related papers (2021-08-26T04:13:52Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.