TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding
- URL: http://arxiv.org/abs/2511.19693v2
- Date: Wed, 26 Nov 2025 17:43:31 GMT
- Title: TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding
- Authors: Chin-Chia Michael Yeh, Uday Singh Saini, Xin Dai, Xiran Fan, Shubham Jain, Yujie Fan, Jiarui Sun, Junpeng Wang, Menghai Pan, Yingtong Dou, Yuzhong Chen, Vineeth Rakesh, Liang Wang, Yan Zheng, Mahashweta Das,
- Abstract summary: We present TREASURE, a multipurpose transformer-based foundation model specifically designed for transaction data.<n>The model simultaneously captures both consumer behavior and payment network signals, providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection.<n>We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.
- Score: 33.669519944170816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.
Related papers
- EST: Towards Efficient Scaling Laws in Click-Through Rate Prediction via Unified Modeling [13.693397814262681]
Efficiently scaling industrial Click-Through Rate (CTR) prediction has recently attracted significant research attention.<n>We propose the Efficiently Scalable Transformer (EST), which achieves fully unified modeling by processing all raw inputs in a single sequence without lossy aggregation.<n>EST significantly outperforms production baselines, delivering a 3.27% RPM (Revenue Per Mile) increase and a 1.22% CTR lift.
arXiv Detail & Related papers (2026-02-11T12:51:54Z) - CTR Prediction on Alibaba's Taobao Advertising Dataset Using Traditional and Deep Learning Models [14.51041016589099]
We explore how to model click-through rates more effectively using a large-scale Taobao dataset released by Alibaba.<n>To better model user intent, we combined behavioral data from hundreds of millions of interactions over a 22-day period.<n>Our research provides a roadmap for advancing click-through rate predictions and extending their value beyond e-commerce.
arXiv Detail & Related papers (2025-11-26T22:51:02Z) - Learning More with Less: A Generalizable, Self-Supervised Framework for Privacy-Preserving Capacity Estimation with EV Charging Data [84.37348569981307]
We propose a first-of-its-kind capacity estimation model based on self-supervised pre-training.<n>Our model consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-10-05T08:58:35Z) - Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents [3.344730946122235]
We propose an innovative pipeline utilizing Large Language Model (LLM) agents to automate data-driven modeling and analysis.<n>We evaluate two LLM-agent frameworks: a multi-agent system featuring specialized collaborative agents, and a single-agent system based on the Reasoning and Acting (ReAct) paradigm.
arXiv Detail & Related papers (2025-10-01T19:28:35Z) - Sequential Data Augmentation for Generative Recommendation [54.765568804267645]
Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences.<n>Data augmentation, the process of constructing training data from user interaction histories, is a critical yet underexplored factor in training these models.<n>We propose GenPAS, a principled framework that models augmentation as a sampling process and enables flexible control of the resulting training distribution.<n>Our experiments on benchmark and industrial datasets demonstrate that GenPAS yields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies.
arXiv Detail & Related papers (2025-09-17T02:53:25Z) - EPR-GAIL: An EPR-Enhanced Hierarchical Imitation Learning Framework to Simulate Complex User Consumption Behaviors [13.436303786475348]
We propose to enhance the fidelity and trustworthiness of the data-driven Generative Adversarial Learning (GAIL) method by blending it with the Exploration and Preferential Return EPR model.<n>The core idea of our EPR-GAIL framework is to model user consumption behaviors as a complex EPR decision process.<n>Experiments on two real-world datasets of user consumption behaviors on an online platform demonstrate that the EPR-GAIL framework outperforms the best state-of-the-art baseline by over 19% in terms of data fidelity.
arXiv Detail & Related papers (2025-03-09T01:56:42Z) - External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation [58.49335224405165]
Ads recommendation is a prominent service of online advertising systems and has been actively studied.<n>Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement.<n>However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications.
arXiv Detail & Related papers (2025-02-20T22:35:52Z) - Multi-task CNN Behavioral Embedding Model For Transaction Fraud Detection [6.153407718616422]
Deep learning methods have become integral to embedding behavior sequence data in fraud detection.<n>We introduce the multitask CNN behavioral Embedding Model for Transaction Fraud Detection.<n>Our contributions include 1) introducing a single-layer CNN design featuring multirange kernels which outperform LSTM and Transformer models in terms of scalability and domain-focused inductive bias.
arXiv Detail & Related papers (2024-11-29T03:58:11Z) - A Utility-Mining-Driven Active Learning Approach for Analyzing Clickstream Sequences [21.38368444137596]
This study introduces the High-Utility Sequential Pattern Mining using SHAP values (HUSPM-SHAP) model.
Our findings demonstrate the model's capability to refine e-commerce data processing, steering towards more streamlined, cost-effective prediction modeling.
arXiv Detail & Related papers (2024-10-09T10:44:02Z) - Consumer Transactions Simulation through Generative Adversarial Networks [0.07373617024876725]
This paper presents an innovative application of Generative Adversarial Networks (GANs) to generate synthetic retail transaction data.
We diverge from conventional methodologies by integrating SKU data into our GAN architecture and using more sophisticated embedding methods.
Preliminary results demonstrate enhanced realism in simulated transactions measured by comparing generated items with real ones.
arXiv Detail & Related papers (2024-08-07T09:45:24Z) - Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development.<n>This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.