Multi-view Contrastive Self-Supervised Learning of Accounting Data
Representations for Downstream Audit Tasks
- URL: http://arxiv.org/abs/2109.11201v1
- Date: Thu, 23 Sep 2021 08:16:31 GMT
- Title: Multi-view Contrastive Self-Supervised Learning of Accounting Data
Representations for Downstream Audit Tasks
- Authors: Marco Schreyer, Timur Sattarov, Damian Borth
- Abstract summary: International audit standards require the direct assessment of a financial statement's underlying accounting transactions, referred to as journal entries.
Deep learning inspired audit techniques have emerged in the field of auditing vast quantities of journal entry data.
We propose a contrastive self-supervised learning framework designed to learn audit task invariant accounting data representations.
- Score: 1.9659095632676094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: International audit standards require the direct assessment of a financial
statement's underlying accounting transactions, referred to as journal entries.
Recently, driven by the advances in artificial intelligence, deep learning
inspired audit techniques have emerged in the field of auditing vast quantities
of journal entry data. Nowadays, the majority of such methods rely on a set of
specialized models, each trained for a particular audit task. At the same time,
when conducting a financial statement audit, audit teams are confronted with
(i) challenging time-budget constraints, (ii) extensive documentation
obligations, and (iii) strict model interpretability requirements. As a result,
auditors prefer to harness only a single preferably `multi-purpose' model
throughout an audit engagement. We propose a contrastive self-supervised
learning framework designed to learn audit task invariant accounting data
representations to meet this requirement. The framework encompasses deliberate
interacting data augmentation policies that utilize the attribute
characteristics of journal entry data. We evaluate the framework on two
real-world datasets of city payments and transfer the learned representations
to three downstream audit tasks: anomaly detection, audit sampling, and audit
documentation. Our experimental results provide empirical evidence that the
proposed framework offers the ability to increase the efficiency of audits by
learning rich and interpretable `multi-task' representations.
Related papers
- AuditWen:An Open-Source Large Language Model for Audit [20.173039073935907]
This study introduces AuditWen, an open-source audit LLM by fine-tuning Qwen with constructing instruction data from audit domain.
We propose an audit LLM, called AuditWen, by fine-tuning Qwen with constructing 28k instruction dataset from 15 audit tasks and 3 layers.
In evaluation stage, we proposed a benchmark with 3k instructions that covers a set of critical audit tasks derived from the application scenarios.
The experimental results demonstrate superior performance of AuditWen both in question understanding and answer generation, making it an immediately valuable tool for audit.
arXiv Detail & Related papers (2024-10-09T02:28:55Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - USB: A Unified Summarization Benchmark Across Tasks and Domains [68.82726887802856]
We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks.
We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models.
arXiv Detail & Related papers (2023-05-23T17:39:54Z) - Fact-Checking Complex Claims with Program-Guided Reasoning [99.7212240712869]
Program-Guided Fact-Checking (ProgramFC) is a novel fact-checking model that decomposes complex claims into simpler sub-tasks.
We first leverage the in-context learning ability of large language models to generate reasoning programs.
We execute the program by delegating each sub-task to the corresponding sub-task handler.
arXiv Detail & Related papers (2023-05-22T06:11:15Z) - Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization [89.04537372465612]
Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
arXiv Detail & Related papers (2022-12-20T17:27:10Z) - Flexible categorization for auditing using formal concept analysis and
Dempster-Shafer theory [55.878249096379804]
We study different ways to categorize according to different extents of interest in different financial accounts.
The framework developed in this paper provides a formal ground to obtain and study explainable categorizations.
arXiv Detail & Related papers (2022-10-31T13:49:16Z) - Federated Continual Learning to Detect Accounting Anomalies in Financial
Auditing [1.2205797997133396]
We propose a Federated Continual Learning framework enabling auditors to learn audit models from decentral clients continuously.
We evaluate the framework's ability to detect accounting anomalies in common scenarios of organizational activity.
arXiv Detail & Related papers (2022-10-26T21:33:08Z) - Federated and Privacy-Preserving Learning of Accounting Data in
Financial Statement Audits [1.4986031916712106]
We propose a Federated Learning framework to train DL models on auditing relevant accounting data of multiple clients.
We evaluate our approach to detect accounting anomalies in three real-world datasets of city payments.
arXiv Detail & Related papers (2022-08-26T15:09:18Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Continual Learning for Unsupervised Anomaly Detection in Continuous
Auditing of Financial Accounting Data [1.9659095632676094]
International audit standards require the direct assessment of a financial statement's underlying accounting journal entries.
Deep-learning inspired audit techniques emerged to examine vast quantities of journal entry data.
This work proposes a continual anomaly detection framework to overcome both challenges and designed to learn from a stream of journal entry data experiences.
arXiv Detail & Related papers (2021-12-25T09:21:14Z) - Learning Sampling in Financial Statement Audits using Vector Quantised
Autoencoder Neural Networks [1.2205797997133396]
We propose the application of Vector Quantised-Variational Autoencoder (VQ-VAE) neural networks.
We demonstrate, based on two real-world city payment datasets, that such artificial neural networks are capable of learning a quantised representation of accounting data.
arXiv Detail & Related papers (2020-08-06T09:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.