Scalable and Weakly Supervised Bank Transaction Classification
- URL: http://arxiv.org/abs/2305.18430v2
- Date: Sat, 10 Jun 2023 04:39:42 GMT
- Title: Scalable and Weakly Supervised Bank Transaction Classification
- Authors: Liam Toran, Cory Van Der Walt, Alan Sammarone, Alex Keller
(Flowcast.ai)
- Abstract summary: This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network training.
We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to categorize bank transactions using weak supervision,
natural language processing, and deep neural network techniques. Our approach
minimizes the reliance on expensive and difficult-to-obtain manual annotations
by leveraging heuristics and domain knowledge to train accurate transaction
classifiers. We present an effective and scalable end-to-end data pipeline,
including data preprocessing, transaction text embedding, anchoring, label
generation, discriminative neural network training, and an overview of the
system architecture. We demonstrate the effectiveness of our method by showing
it outperforms existing market-leading solutions, achieves accurate
categorization, and can be quickly extended to novel and composite use cases.
This can in turn unlock many financial applications such as financial health
reporting and credit risk assessment.
Related papers
- Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus [7.046417074932257]
We describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions.
Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance.
We present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store.
arXiv Detail & Related papers (2024-03-29T13:15:46Z) - FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications [2.2661367844871854]
Large Language Models (LLMs) can be used in this context, but they are not finance-specific and tend to require significant computational resources.
We introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation.
This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data.
arXiv Detail & Related papers (2024-03-18T22:11:00Z) - Towards a Foundation Purchasing Model: Pretrained Generative
Autoregression on Transaction Sequences [0.0]
We present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions.
We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions.
arXiv Detail & Related papers (2024-01-03T09:32:48Z) - Generative Pretraining at Scale: Transformer-Based Encoding of
Transactional Behavior for Fraud Detection [0.0]
Our model confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior.
We integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China.
arXiv Detail & Related papers (2023-12-22T03:15:17Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Detecting Anomalous Cryptocurrency Transactions: an AML/CFT Application
of Machine Learning-based Forensics [5.617291981476445]
The paper analyzes a real-world dataset of Bitcoin transactions represented as a directed graph network through various techniques.
It shows that the neural network types known as Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) are a promising AML/CFT solution.
arXiv Detail & Related papers (2022-06-07T16:22:55Z) - Relational Graph Neural Networks for Fraud Detection in a Super-App
environment [53.561797148529664]
We propose a framework of relational graph convolutional networks methods for fraudulent behaviour prevention in the financial services of a Super-App.
We use an interpretability algorithm for graph neural networks to determine the most important relations to the classification task of the users.
Our results show that there is an added value when considering models that take advantage of the alternative data of the Super-App and the interactions found in their high connectivity.
arXiv Detail & Related papers (2021-07-29T00:02:06Z) - Supporting Financial Inclusion with Graph Machine Learning and Super-App
Alternative Data [63.942632088208505]
Super-Apps have changed the way we think about the interactions between users and commerce.
This paper investigates how different interactions between users within a Super-App provide a new source of information to predict borrower behavior.
arXiv Detail & Related papers (2021-02-19T15:13:06Z) - Counterfactual Detection meets Transfer Learning [48.82717416666232]
We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures.
We introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
arXiv Detail & Related papers (2020-05-27T02:02:57Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.