Specialized text classification: an approach to classifying Open Banking transactions
- URL: http://arxiv.org/abs/2504.12319v1
- Date: Thu, 10 Apr 2025 17:14:43 GMT
- Title: Specialized text classification: an approach to classifying Open Banking transactions
- Authors: Duc Tuyen TA, Wajdi Ben Saad, Ji Young Oh,
- Abstract summary: This paper introduces a language-based Open Banking transaction classification system with a focus on the french market and french language text.<n>By incorporating language-specific techniques and domain knowledge, the proposed system demonstrates enhanced performance and efficiency.
- Score: 0.13108652488669734
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the introduction of the PSD2 regulation in the EU which established the Open Banking framework, a new window of opportunities has opened for banks and fintechs to explore and enrich Bank transaction descriptions with the aim of building a better understanding of customer behavior, while using this understanding to prevent fraud, reduce risks and offer more competitive and tailored services. And although the usage of natural language processing models and techniques has seen an incredible progress in various applications and domains over the past few years, custom applications based on domain-specific text corpus remain unaddressed especially in the banking sector. In this paper, we introduce a language-based Open Banking transaction classification system with a focus on the french market and french language text. The system encompasses data collection, labeling, preprocessing, modeling, and evaluation stages. Unlike previous studies that focus on general classification approaches, this system is specifically tailored to address the challenges posed by training a language model with a specialized text corpus (Banking data in the French context). By incorporating language-specific techniques and domain knowledge, the proposed system demonstrates enhanced performance and efficiency compared to generic approaches.
Related papers
- Credit C-GPT: A Domain-Specialized Large Language Model for Conversational Understanding in Vietnamese Debt Collection [0.0]
This paper introduces Credit C-GPT, a domain-specialized large language model with seven billion parameters, fine-tuned for conversational understanding in Vietnamese debt collection scenarios.<n>The proposed model integrates multiple conversational intelligence tasks, including dialogue understanding, sentiment recognition, intent detection, call stage classification, and structured slot-value extraction.
arXiv Detail & Related papers (2026-01-15T08:12:55Z) - Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection [29.14690532256978]
This paper proposes a novel approach that employs Reinforcement Learning (RL) to post-train lightweight language models for fraud detection tasks.<n>We utilize the Group Sequence Policy Optimization (GSPO) algorithm combined with a rule-based reward system to fine-tune language models of various sizes on a real-life transaction dataset.<n>Our experimental results demonstrate the effectiveness of this approach, with post-trained language models achieving substantial F1-score improvements on held-out test data.
arXiv Detail & Related papers (2026-01-09T06:56:27Z) - Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions [0.0]
We introduce a foundational model for financial transactions that integrates structured attributes and unstructured textual descriptions into a unified representation.<n>We demonstrate that our approach outperforms classical feature engineering and discrete event sequence methods.<n>Results highlight the potential of self-supervised models to advance financial applications ranging from fraud prevention and credit risk to customer insights.
arXiv Detail & Related papers (2025-11-15T10:52:39Z) - Graph Retrieval-Augmented LLM for Conversational Recommendation Systems [52.35491420330534]
G-CRS (Graph Retrieval-Augmented Large Language Model for Conversational Recommender Systems) is a training-free framework that combines graph retrieval-augmented generation and in-context learning.<n>G-CRS achieves superior recommendation performance compared to existing methods without requiring task-specific training.
arXiv Detail & Related papers (2025-03-09T03:56:22Z) - Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models [58.936893810674896]
Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems.
We introduce a multimodal large language model framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS)
We propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images.
arXiv Detail & Related papers (2025-01-03T09:25:04Z) - Modular Conversational Agents for Surveys and Interviews [6.019313905775819]
This paper introduces a modular approach and its resulting parameterized process for designing AI agents.<n>We demonstrate the adaptability, generalizability, and efficacy of our modular approach through three empirical studies.<n>The results suggest that the AI agent increases completion rates and response quality.
arXiv Detail & Related papers (2024-12-22T15:00:16Z) - DarijaBanking: A New Resource for Overcoming Language Barriers in Banking Intent Detection for Moroccan Arabic Speakers [5.274804664403783]
Navigating the complexities of language diversity is a central challenge in developing robust natural language processing systems.
This paper introduces textbfDarijaBanking, a novel Darija dataset aimed at enhancing intent classification in the banking domain.
DarijaBanking comprises over 1,800 parallel high-quality queries in Darija, Modern Standard Arabic (MSA), English, and French, organized into 24 intent classes.
arXiv Detail & Related papers (2024-05-26T08:33:28Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.<n>We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus [7.046417074932257]
We describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions.
Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance.
We present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store.
arXiv Detail & Related papers (2024-03-29T13:15:46Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Scalable and Weakly Supervised Bank Transaction Classification [0.0]
This paper aims to categorize bank transactions using weak supervision, natural language processing, and deep neural network training.
We present an effective and scalable end-to-end data pipeline, including data preprocessing, transaction text embedding, anchoring, label generation, discriminative neural network training.
arXiv Detail & Related papers (2023-05-28T23:12:12Z) - Detection of Abuse in Financial Transaction Descriptions Using Machine
Learning [4.04516535783148]
This paper describes the problem of tech-assisted abuse in the context of banking services.
It outlines the developed model and its performance, and the operating framework more broadly.
arXiv Detail & Related papers (2023-03-10T06:10:53Z) - Detecting ESG topics using domain-specific language models and data
augmentation approaches [3.3332986505989446]
Natural language processing tasks in the financial domain remain challenging due to paucity of appropriately labelled data.
Here, we investigate two approaches that may help to mitigate these issues.
Firstly, we experiment with further language model pre-training using large amounts of in-domain data from business and financial news.
We then apply augmentation approaches to increase the size of our dataset for model fine-tuning.
arXiv Detail & Related papers (2020-10-16T11:20:07Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.