The geometry of financial institutions -- Wasserstein clustering of
financial data
- URL: http://arxiv.org/abs/2305.03565v1
- Date: Fri, 5 May 2023 14:16:29 GMT
- Title: The geometry of financial institutions -- Wasserstein clustering of
financial data
- Authors: Lorenz Riess, Mathias Beiglb\"ock, Johannes Temme, Andreas Wolf, Julio
Backhoff
- Abstract summary: We develop methods for condensing granular and big data into a representative and intelligible map.
Financial regulation is a field that exemplifies this need, as regulators require diverse and often highly granular data from financial institutions to monitor and assess their activities.
We propose a variant of Lloyd's algorithm that applies to probability distributions and uses generalized Wasserstein barycenters to construct a metric space which represents given data in condensed form.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing availability of granular and big data on various objects of
interest has made it necessary to develop methods for condensing this
information into a representative and intelligible map. Financial regulation is
a field that exemplifies this need, as regulators require diverse and often
highly granular data from financial institutions to monitor and assess their
activities. However, processing and analyzing such data can be a daunting task,
especially given the challenges of dealing with missing values and identifying
clusters based on specific features.
To address these challenges, we propose a variant of Lloyd's algorithm that
applies to probability distributions and uses generalized Wasserstein
barycenters to construct a metric space which represents given data on various
objects in condensed form. By applying our method to the financial regulation
context, we demonstrate its usefulness in dealing with the specific challenges
faced by regulators in this domain. We believe that our approach can also be
applied more generally to other fields where large and complex data sets need
to be represented in concise form.
Related papers
- GEMS: Generative Expert Metric System through Iterative Prompt Priming [18.0413505095456]
Non-experts can find it unintuitive to create effective measures or transform theories into context-specific metrics.
This technical report addresses this challenge by examining software communities within large software corporations.
We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories.
arXiv Detail & Related papers (2024-10-01T17:14:54Z) - Multi-agent Planning using Visual Language Models [2.2369578015657954]
Large Language Models (LLMs) and Visual Language Models (VLMs) are attracting increasing interest due to their improving performance and applications across various domains and tasks.
LLMs andVLMs can produce erroneous results, especially when a deep understanding of the problem domain is required.
We propose a multi-agent architecture for embodied task planning that operates without the need for specific data structures as input.
arXiv Detail & Related papers (2024-08-10T08:10:17Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Universal representations for financial transactional data: embracing local, global, and external contexts [95.7760348824795]
We present a representation learning framework that addresses diverse business challenges.
We also suggest novel generative models that account for data specifics, and a way to integrate external information into a client's representation.
arXiv Detail & Related papers (2024-04-02T15:39:14Z) - REFinD: Relation Extraction Financial Dataset [7.207699035400335]
We propose REFinD, the first large-scale annotated dataset of relations, with $sim$29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents.
We observed that various state-of-the-art deep learning models struggle with numeric inference, relational and directional ambiguity.
arXiv Detail & Related papers (2023-05-22T22:40:11Z) - Flexible categorization for auditing using formal concept analysis and
Dempster-Shafer theory [55.878249096379804]
We study different ways to categorize according to different extents of interest in different financial accounts.
The framework developed in this paper provides a formal ground to obtain and study explainable categorizations.
arXiv Detail & Related papers (2022-10-31T13:49:16Z) - How Much More Data Do I Need? Estimating Requirements for Downstream
Tasks [99.44608160188905]
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
Overestimating or underestimating data requirements incurs substantial costs that could be avoided with an adequate budget.
Using our guidelines, practitioners can accurately estimate data requirements of machine learning systems to gain savings in both development time and data acquisition costs.
arXiv Detail & Related papers (2022-07-04T21:16:05Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.