Gradient Boosting Trees and Large Language Models for Tabular Data Few-Shot Learning
- URL: http://arxiv.org/abs/2411.04324v1
- Date: Wed, 06 Nov 2024 23:54:09 GMT
- Title: Gradient Boosting Trees and Large Language Models for Tabular Data Few-Shot Learning
- Authors: Carlos Huertas,
- Abstract summary: Large Language Models (LLM) have brought numerous of new applications to Machine Learning (ML)
In this work we demonstrate that although LLMs are a viable alternative, the evidence suggests that baselines used to gauge performance can be improved.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLM) have brought numerous of new applications to Machine Learning (ML). In the context of tabular data (TD), recent studies show that TabLLM is a very powerful mechanism for few-shot-learning (FSL) applications, even if gradient boosting decisions trees (GBDT) have historically dominated the TD field. In this work we demonstrate that although LLMs are a viable alternative, the evidence suggests that baselines used to gauge performance can be improved. We replicated public benchmarks and our methodology improves LightGBM by 290%, this is mainly driven by forcing node splitting with few samples, a critical step in FSL with GBDT. Our results show an advantage to TabLLM for 8 or fewer shots, but as the number of samples increases GBDT provides competitive performance at a fraction of runtime. For other real-life applications with vast number of samples, we found FSL still useful to improve model diversity, and when combined with ExtraTrees it provides strong resilience to overfitting, our proposal was validated in a ML competition setting ranking first place.
Related papers
- Active Few-Shot Learning for Text Classification [43.58047311582709]
The rise of Large Language Models (LLMs) has boosted the use of Few-Shot Learning (FSL) methods in natural language processing.
We propose an active learning-based instance selection mechanism that identifies effective support instances from the unlabeled pool.
arXiv Detail & Related papers (2025-02-26T03:30:13Z) - Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes [135.68092471784516]
We propose a simple and lightweight approach for fusing large language models and gradient-boosted decision trees.
We name our fusion methods LLM-Boost and PFN-Boost, respectively.
We demonstrate state-of-the-art performance against numerous baselines and ensembling algorithms.
arXiv Detail & Related papers (2025-02-04T19:30:41Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning [35.03338699349037]
We propose a novel in-context learning framework, FeatLLM, which employs Large Language Models as feature engineers.
FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.
arXiv Detail & Related papers (2024-04-15T06:26:08Z) - Scaling Sentence Embeddings with Large Language Models [43.19994568210206]
In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance.
Our approach involves adapting the previous prompt-based representation method for autoregressive models.
By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity tasks.
arXiv Detail & Related papers (2023-07-31T13:26:03Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Enhancing Transformers with Gradient Boosted Decision Trees for NLI
Fine-Tuning [7.906608953906889]
We introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network.
We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model.
arXiv Detail & Related papers (2021-05-08T22:31:51Z) - Improving Semi-supervised Federated Learning by Reducing the Gradient
Diversity of Models [67.66144604972052]
Federated learning (FL) is a promising way to use the computing power of mobile devices while maintaining privacy of users.
We show that a critical issue that affects the test accuracy is the large gradient diversity of the models from different users.
We propose a novel grouping-based model averaging method to replace the FedAvg averaging method.
arXiv Detail & Related papers (2020-08-26T03:36:07Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z) - Revisiting Training Strategies and Generalization Performance in Deep
Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices.
Under consistent comparison, DML objectives show much higher saturation than indicated by literature.
Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.