Related papers: Scalable and Weakly Supervised Bank Transaction Classification

Scalable and Weakly Supervised Bank Transaction Classification

URL: http://arxiv.org/abs/2305.18430v2
Date: Sat, 10 Jun 2023 04:39:42 GMT
Title: Scalable and Weakly Supervised Bank Transaction Classification
Authors: Liam Toran, Cory Van Der Walt, Alan Sammarone, Alex Keller (Flowcast.ai)
Abstract summary: This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network training. We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network techniques. Our approach minimizes the reliance on expensive and difficult-to-obtain manual annotations by leveraging heuristics and domain knowledge to train accurate transaction classifiers. We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training, and an overview of the system architecture. We demonstrate the effectiveness of our method by showing it outperforms existing market-leading solutions, achieves accurate categorization, and can be quickly extended to novel and composite use cases. This can in turn unlock many financial applications such as financial health reporting and credit risk assessment.

Related papers

Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions [51.43521977132062]
Money laundering is a financial crime that obscures the origin of illicit funds. The proliferation of mobile payment platforms and smart IoT devices has significantly complicated anti-money laundering investigations. This paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML.
arXiv Detail & Related papers (2025-03-13T05:19:44Z)
Optimizing Blockchain Analysis: Tackling Temporality and Scalability with an Incremental Approach with Metropolis-Hastings Random Walks [2.855856661274715]
Existing methods primarily focus on snapshots of transaction networks. We propose an incremental approach for random walk-based node representation learning in transaction networks. Potential applications include transaction network monitoring, the efficient classification of blockchain addresses for fraud detection or the identification of specialized address types within the network.
arXiv Detail & Related papers (2025-01-21T20:34:38Z)
Explainable AI for Fraud Detection: An Attention-Based Ensemble of CNNs, GNNs, and A Confidence-Driven Gating Mechanism [5.486205584465161]
This study presents a new stacking-based approach for CCF detection by adding two extra layers to the usual classification process. In the attention layer, we combine soft outputs from a convolutional neural network (CNN) and a recurrent neural network (RNN) using the dependent ordered weighted averaging (DOWA) operator. In the confidence-based layer, we select whichever aggregate (DOWA or IOWA) shows lower uncertainty to feed into a meta-learner. Experiments on three datasets show that our method achieves high accuracy and robust generalization, making it effective for CCF detection.
arXiv Detail & Related papers (2024-10-01T09:56:23Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus [7.046417074932257]
We describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions. Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance. We present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store.
arXiv Detail & Related papers (2024-03-29T13:15:46Z)
FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications [2.2661367844871854]
Large Language Models (LLMs) can be used in this context, but they are not finance-specific and tend to require significant computational resources. We introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation. This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data.
arXiv Detail & Related papers (2024-03-18T22:11:00Z)
Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences [0.0]
We present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions. We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions.
arXiv Detail & Related papers (2024-01-03T09:32:48Z)
Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection [0.0]
Our model confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior. We integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China.
arXiv Detail & Related papers (2023-12-22T03:15:17Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Detecting Anomalous Cryptocurrency Transactions: an AML/CFT Application of Machine Learning-based Forensics [5.617291981476445]
The paper analyzes a real-world dataset of Bitcoin transactions represented as a directed graph network through various techniques. It shows that the neural network types known as Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) are a promising AML/CFT solution.
arXiv Detail & Related papers (2022-06-07T16:22:55Z)
Relational Graph Neural Networks for Fraud Detection in a Super-App environment [53.561797148529664]
We propose a framework of relational graph convolutional networks methods for fraudulent behaviour prevention in the financial services of a Super-App. We use an interpretability algorithm for graph neural networks to determine the most important relations to the classification task of the users. Our results show that there is an added value when considering models that take advantage of the alternative data of the Super-App and the interactions found in their high connectivity.
arXiv Detail & Related papers (2021-07-29T00:02:06Z)
Supporting Financial Inclusion with Graph Machine Learning and Super-App Alternative Data [63.942632088208505]
Super-Apps have changed the way we think about the interactions between users and commerce. This paper investigates how different interactions between users within a Super-App provide a new source of information to predict borrower behavior.
arXiv Detail & Related papers (2021-02-19T15:13:06Z)
Counterfactual Detection meets Transfer Learning [48.82717416666232]
We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures. We introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
arXiv Detail & Related papers (2020-05-27T02:02:57Z)
Super-App Behavioral Patterns in Credit Risk Models: Financial, Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models. Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.