Topological Data Analysis for Unsupervised Anomaly Detection and Customer Segmentation on Banking Data
- URL: http://arxiv.org/abs/2508.14136v1
- Date: Tue, 19 Aug 2025 12:58:00 GMT
- Title: Topological Data Analysis for Unsupervised Anomaly Detection and Customer Segmentation on Banking Data
- Authors: Leonardo Aldo Alejandro Barberi, Linda Maria De Cave,
- Abstract summary: This paper introduces advanced techniques of Topological Data Analysis (TDA) for unsupervised anomaly detection and customer segmentation in banking data.<n>We develop unsupervised procedures that uncover meaningful patterns in customers' banking data by exploiting topological information.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces advanced techniques of Topological Data Analysis (TDA) for unsupervised anomaly detection and customer segmentation in banking data. Using the Mapper algorithm and persistent homology, we develop unsupervised procedures that uncover meaningful patterns in customers' banking data by exploiting topological information. The framework we present in this paper yields actionable insights that combine the abstract mathematical subject of topology with real-life use cases that are useful in industry.
Related papers
- A Graph Machine Learning Approach for Detecting Topological Patterns in Transactional Graphs [0.9199488958939334]
The rise of digital ecosystems has exposed the financial sector to evolving abuse and criminal tactics.<n>Traditional rule-based systems lack the adaptability needed to detect sophisticated or coordinated criminal behaviors.<n>We propose an approach that integrates graph machine learning and network analysis to improve the detection of well-known topological patterns.
arXiv Detail & Related papers (2025-09-16T06:43:11Z) - Advanced fraud detection using machine learning models: enhancing financial transaction security [0.3370543514515051]
This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data.
arXiv Detail & Related papers (2025-06-12T15:59:25Z) - TopoFR: A Closer Look at Topology Alignment on Face Recognition [58.45515807380505]
We propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE.<n> PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model.<n> Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T14:58:30Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Demonstration of InsightPilot: An LLM-Empowered Automated Data
Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process.
InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining.
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Deep Learning for Anomaly Detection in Log Data: A Survey [3.508620069426877]
Self-learning anomaly detection techniques capture patterns in log data and report unexpected log event occurrences.
Deep learning neural networks for this purpose have been presented.
There exist many different architectures for deep learning and it is non-trivial to encode raw and unstructured log data.
arXiv Detail & Related papers (2022-07-08T10:58:28Z) - Topological Data Analysis for Anomaly Detection in Host-Based Logs [1.0878040851638]
We present an approach that builds a filtration of simplicial complexes directly from Windows logs, enabling analysis of their intrinsic structure using topological tools.
We end by discussing the potential for our methods to be used as part of an explainable framework for anomaly detection.
arXiv Detail & Related papers (2022-04-25T20:41:02Z) - A Review of Topological Data Analysis for Cybersecurity [1.0878040851638]
Topological Data Analysis (TDA) studies the high level structure of data using techniques from algebraic topology.
We hope to highlight to researchers a promising new area with strong potential to improve cybersecurity data science.
arXiv Detail & Related papers (2022-02-16T13:03:52Z) - LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak
Supervision [63.08516384181491]
We present LogLAB, a novel modeling approach for automated labeling of log messages without requiring manual work by experts.
Our method relies on estimated failure time windows provided by monitoring systems to produce precise labeled datasets in retrospect.
Our evaluation shows that LogLAB consistently outperforms nine benchmark approaches across three different datasets and maintains an F1-score of more than 0.98 even at large failure time windows.
arXiv Detail & Related papers (2021-11-02T15:16:08Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.