Related papers: Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

URL: http://arxiv.org/abs/2502.01634v1
Date: Mon, 03 Feb 2025 18:59:04 GMT
Title: Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Authors: Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao,
Abstract summary: We propose an efficient online learning framework for GBDT supporting both incremental and decremental learning.<n>To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly.<n>Backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model.
Score: 18.21562008536426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.

Related papers

Online-BLS: An Accurate and Efficient Online Broad Learning System for Data Stream Classification [52.251569042852815]
We introduce an online broad learning system framework with closed-form solutions for each online update.<n>We design an effective weight estimation algorithm and an efficient online updating strategy.<n>Our framework is naturally extended to data stream scenarios with concept drift and exceeds state-of-the-art baselines.
arXiv Detail & Related papers (2025-01-28T13:21:59Z)
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning [35.681853074122735]
We introduce an exact unlearning framework -- Sequence-aware Sharded Sliced Training (S3T) S3T is designed to enhance the deletion capabilities of an exact unlearning system while minimizing the impact on model's performance. We demonstrate S3T attains superior deletion capabilities and enhanced performance compared to baselines across a wide range of settings.
arXiv Detail & Related papers (2024-06-24T01:45:13Z)
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping [53.454408491386886]
bootstrapping self-alignment markedly surpasses the single-round approach. We propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance. Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance.
arXiv Detail & Related papers (2024-02-12T12:30:42Z)
Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance. We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning [66.20311762506702]
dataset pruning (DP) has emerged as an effective way to improve data efficiency. We propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings. We show that source data classes can be pruned by up to 40% 80% without sacrificing downstream performance.
arXiv Detail & Related papers (2023-10-13T00:07:49Z)
Recommendation Unlearning via Influence Function [42.4931807753579]
We propose a new Influence Function-based Recommendation Unlearning (IFRU) framework, which efficiently updates the model without retraining. IFRU achieves more than 250 times acceleration compared to retraining-based methods with recommendation performance comparable to full retraining.
arXiv Detail & Related papers (2023-07-05T09:42:51Z)
CLIP: Train Faster with Less Data [3.2575001434344286]
Deep learning models require an enormous amount of data for training. Recently there is a shift in machine learning from model-centric to data-centric approaches. We propose CLIP i.e., Curriculum Learning with Iterative data Pruning.
arXiv Detail & Related papers (2022-12-02T21:29:48Z)
BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z)
Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models. We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data. We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z)
Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning [11.138005656807968]
We introduce EfficientCL, a memory-efficient continual pretraining method. For data augmentation, we stack two types of operation sequentially: cutoff and PCA jittering. While pretraining steps proceed, we apply curriculum learning by incrementing the augmentation degree for each difficulty step.
arXiv Detail & Related papers (2021-09-10T05:49:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.