Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
- URL: http://arxiv.org/abs/2410.08243v1
- Date: Thu, 10 Oct 2024 08:13:39 GMT
- Title: Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
- Authors: Cyrile Delestre, Yoann Sola,
- Abstract summary: Banking Transaction Flow (BTF) is a sequential data composed of three modalities: a date, a numerical value and a wording.
We trained two general models on a large amount of BTFs in a self-supervised way.
The performance of these two models was evaluated on two banking downstream tasks: a transaction categorization task and a credit risk task.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Banking Transaction Flow (BTF) is a sequential data found in a number of banking activities such as marketing, credit risk or banking fraud. It is a multimodal data composed of three modalities: a date, a numerical value and a wording. We propose in this work an application of self-attention mechanism to the processing of BTFs. We trained two general models on a large amount of BTFs in a self-supervised way: one RNN-based model and one Transformer-based model. We proposed a specific tokenization in order to be able to process BTFs. The performance of these two models was evaluated on two banking downstream tasks: a transaction categorization task and a credit risk task. The results show that fine-tuning these two pre-trained models allowed to perform better than the state-of-the-art approaches for both tasks.
Related papers
- Machine and Deep Learning for Credit Scoring: A compliant approach [0.0]
This paper is a tentative to challenge the current regulatory status-quo and introduce new BASEL 2 and 3 compliant techniques.
We prove that the usage of such algorithms drastically improves performance and default capture rate.
Furthermore, we leverage the power of Shapley Values to prove that these relatively simple models are not as black-box as the current regulatory system thinks they are.
arXiv Detail & Related papers (2024-12-28T17:46:43Z) - TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements [1.9461727843485295]
Existing table structure recognition approaches produce sub optimal results for long, complex tables.
This paper proposes TabSniper, a novel approach for efficient table detection, categorization and structure recognition from bank statements.
arXiv Detail & Related papers (2024-12-17T11:47:59Z) - STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading [55.02735046724146]
In financial trading, factor models are widely used to price assets and capture excess returns from mispricing.
We propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders, named STORM.
Storm extracts features of stocks from temporal and spatial perspectives, then fuses and aligns these features at the fine-grained and semantic level, and represents the factors as multi-dimensional embeddings.
arXiv Detail & Related papers (2024-12-12T17:15:49Z) - Multi-task CNN Behavioral Embedding Model For Transaction Fraud Detection [6.153407718616422]
Deep learning methods have become integral to embedding behavior sequence data in fraud detection.
We introduce the multitask CNN behavioral Embedding Model for Transaction Fraud Detection.
Our contributions include 1) introducing a single-layer CNN design featuring multirange kernels which outperform LSTM and Transformer models in terms of scalability and domain-focused inductive bias.
arXiv Detail & Related papers (2024-11-29T03:58:11Z) - Towards a Foundation Purchasing Model: Pretrained Generative
Autoregression on Transaction Sequences [0.0]
We present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions.
We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions.
arXiv Detail & Related papers (2024-01-03T09:32:48Z) - Cross-BERT for Point Cloud Pretraining [61.762046503448936]
We propose a new cross-modal BERT-style self-supervised learning paradigm, called Cross-BERT.
To facilitate pretraining for irregular and sparse point clouds, we design two self-supervised tasks to boost cross-modal interaction.
Our work highlights the effectiveness of leveraging cross-modal 2D knowledge to strengthen 3D point cloud representation and the transferable capability of BERT across modalities.
arXiv Detail & Related papers (2023-12-08T08:18:12Z) - Multimodal Document Analytics for Banking Process Automation [4.541582055558865]
The paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business.
It offers practical guidance on how to unlock this potential in day-to-day operations.
arXiv Detail & Related papers (2023-07-21T18:29:04Z) - FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing.
FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data.
Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.