Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach
- URL: http://arxiv.org/abs/2501.12723v1
- Date: Wed, 22 Jan 2025 08:53:12 GMT
- Title: Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach
- Authors: Sota Mashiko, Yuji Kawamata, Tomoru Nakayama, Tetsuya Sakurai, Yukihiko Okada,
- Abstract summary: Anomaly detection is crucial in financial auditing and effective detection often requires obtaining large volumes of data from multiple organizations.
In this study, we propose a novel framework employing Data Collaboration (DC) analysis to streamline model training into a single communication round.
Our findings represent a significant advance in artificial intelligence-driven auditing and underscore the potential of FL methods in high-security domains.
- Score: 3.827294988616478
- License:
- Abstract: Anomaly detection is crucial in financial auditing and effective detection often requires obtaining large volumes of data from multiple organizations. However, confidentiality concerns hinder data sharing among audit firms. Although the federated learning (FL)-based approach, FedAvg, has been proposed to address this challenge, its use of mutiple communication rounds increases its overhead, limiting its practicality. In this study, we propose a novel framework employing Data Collaboration (DC) analysis -- a non-model share-type FL method -- to streamline model training into a single communication round. Our method first encodes journal entry data via dimensionality reduction to obtain secure intermediate representations, then transforms them into collaboration representations for building an autoencoder that detects anomalies. We evaluate our approach on a synthetic dataset and real journal entry data from multiple organizations. The results show that our method not only outperforms single-organization baselines but also exceeds FedAvg in non-i.i.d. experiments on real journal entry data that closely mirror real-world conditions. By preserving data confidentiality and reducing iterative communication, this study addresses a key auditing challenge -- ensuring data confidentiality while integrating knowledge from multiple audit firms. Our findings represent a significant advance in artificial intelligence-driven auditing and underscore the potential of FL methods in high-security domains.
Related papers
- A Two-Stage Federated Learning Approach for Industrial Prognostics Using Large-Scale High-Dimensional Signals [1.2277343096128712]
Industrial prognostics aims to develop data-driven methods that leverage high-dimensional degradation signals from assets to predict their failure times.
In practice, individual organizations often lack sufficient data to independently train reliable prognostic models.
This article proposes a statistical learning-based federated model that enables multiple organizations to jointly train a prognostic model.
arXiv Detail & Related papers (2024-10-14T21:26:22Z) - Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data [11.027356898413139]
Anomaly detection in real-world scenarios poses challenges due to dynamic and often unknown anomaly distributions.
This paper addresses the question of enhancing outlier detection within individual organizations without compromising data confidentiality.
We propose a novel method leveraging representation learning and federated learning techniques to improve the detection of unknown anomalies.
arXiv Detail & Related papers (2024-04-23T11:22:04Z) - Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge
Proofs [30.260427020479536]
In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration.
Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification.
Thirdly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance.
arXiv Detail & Related papers (2023-11-26T14:19:46Z) - Exploring Federated Unlearning: Analysis, Comparison, and Insights [101.64910079905566]
federated unlearning enables the selective removal of data from models trained in federated systems.
This paper examines existing federated unlearning approaches, examining their algorithmic efficiency, impact on model accuracy, and effectiveness in preserving privacy.
We propose the OpenFederatedUnlearning framework, a unified benchmark for evaluating federated unlearning methods.
arXiv Detail & Related papers (2023-10-30T01:34:33Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Auto-weighted Robust Federated Learning with Corrupted Data Sources [7.475348174281237]
Federated learning provides a communication-efficient and privacy-preserving training process.
Standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions.
We propose Auto-weighted Robust Federated Learning (arfl) to provide robustness against corrupted data sources.
arXiv Detail & Related papers (2021-01-14T21:54:55Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z) - Stratified cross-validation for unbiased and privacy-preserving
federated learning [0.0]
We focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances.
We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings.
arXiv Detail & Related papers (2020-01-22T15:49:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.