Related papers: Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

URL: http://arxiv.org/abs/2405.13692v1
Date: Wed, 22 May 2024 14:38:48 GMT
Title: Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Authors: Sergei Krutikov, Bulat Khaertdinov, Rodion Kiriukhin, Shubham Agrawal, Kees Jan De Vries,
Abstract summary: Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score.
Score: 1.6702285371066043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for training classification models with standard ML algorithms. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL, giving it an advantage over classical methods. Furthermore, we conduct large-scale experiments, pre-training tabular Transformers on vast amounts of data instances and fine-tuning them on smaller target datasets. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score. Pre-trained models show more consistent performance than the ones trained from scratch when fine-tuning data is limited. Moreover, they require noticeably less labeled data for reaching performance comparable to their GBDT competitor that utilizes the whole dataset.

Related papers

Transformer Learns Optimal Variable Selection in Group-Sparse Classification [14.760685658938787]
We give a case study of how transformers can be trained to learn a classic statistical model with "group sparsity"<n>We theoretically demonstrate that, a one-layer transformer trained by gradient descent can correctly leverage the attention mechanism to select variables.<n>We also demonstrate that a well-pretrained one-layer transformer can be adapted to new downstream tasks to achieve good prediction accuracy with a limited number of samples.
arXiv Detail & Related papers (2025-04-11T15:39:44Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning [0.0]
This study addresses the underutilization of abundant unlabeled data by employing self-supervised learning (SSL) to extract latent features from this data. Our experiments demonstrate that SSL significantly enhances the Transformer model's ability to learn representations. This approach holds promise for broader applications in PICU environments, where annotated data is often limited.
arXiv Detail & Related papers (2024-01-02T04:00:48Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order. In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z)
Remote Sensing Change Detection With Transformers Trained from Scratch [62.96911491252686]
transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark. We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks.
arXiv Detail & Related papers (2023-04-13T17:57:54Z)
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss [15.304831835680847]
We propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.
arXiv Detail & Related papers (2022-11-11T21:09:58Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
Towards Data-Efficient Detection Transformers [77.43470797296906]
We show most detection transformers suffer from significant performance drops on small-size datasets. We empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. We introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
arXiv Detail & Related papers (2022-03-17T17:56:34Z)
Discriminative and Generative Transformer-based Models For Situation Entity Classification [8.029049649310211]
We re-examine the situation entity (SE) classification task with varying amounts of available training data. We exploit a Transformer-based variational autoencoder to encode sentences into a lower dimensional latent space.
arXiv Detail & Related papers (2021-09-15T17:07:07Z)
Understanding Transformers for Bot Detection in Twitter [0.0]
We focus on bot detection in Twitter, a key task to mitigate and counteract the automatic spreading of disinformation and bias in social media. We investigate the use of pre-trained language models to tackle the detection of tweets generated by a bot or a human account based exclusively on its content. We observe that fine-tuning generative transformers on a bot detection task produces higher accuracies.
arXiv Detail & Related papers (2021-04-13T13:32:55Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.