Related papers: Optimization of Decision Tree Evaluation Using SIMD Instructions

Related papers

TreeRPO: Tree Relative Policy Optimization [55.97385410074841]
name is a novel method that estimates the mathematical expectations of rewards at various reasoning steps using tree sampling.<n>Building on the group-relative reward training mechanism of GRPO, name innovatively computes rewards based on step-level groups generated during tree sampling.
arXiv Detail & Related papers (2025-06-05T15:56:38Z)
Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models [88.29990536278167]
We introduce SPaR, a self-play framework integrating tree-search self-refinement to yield valid and comparable preference pairs. Our experiments show that a LLaMA3-8B model, trained over three iterations guided by SPaR, surpasses GPT-4-Turbo on the IFEval benchmark without losing general capabilities.
arXiv Detail & Related papers (2024-12-16T09:47:43Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs. We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z)
Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation [3.737361598712633]
We present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step. The results show that the performance of decision tree ensemble inference can be significantly improved.
arXiv Detail & Related papers (2024-04-10T09:17:22Z)
Fast Inference of Tree Ensembles on ARM Devices [6.995377781193234]
We convert the popular QuickScorer algorithm and its siblings from Intel's AVX to ARM's NEON instruction set. Third, we investigate the effects of using fixed-point quantization in Random Forests.
arXiv Detail & Related papers (2023-05-15T12:05:03Z)
Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree. In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z)
Towards a learning-based performance modeling for accelerating Deep Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs) Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z)
Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models [5.131758478675364]
Low-order functional ANOVA models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.
arXiv Detail & Related papers (2022-07-14T14:23:14Z)
Cortex: A Compiler for Recursive Deep Learning Models [12.307249556836375]
We present Cortex, a compiler-based approach to generate highly-efficient code for deep learning models. Our compiler approach and low reliance on vendor libraries enables us to perform end-to-end optimizations, leading to up to 14X lower inference latencies over past work.
arXiv Detail & Related papers (2020-11-02T23:35:14Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
Heuristic Semi-Supervised Learning for Graph Generation Inspired by Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph. In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.