Related papers: Mechanistic interpretability of large language models with applications to the financial services industry

Mechanistic interpretability of large language models with applications to the financial services industry

URL: http://arxiv.org/abs/2407.11215v2
Date: Wed, 16 Oct 2024 02:40:53 GMT
Title: Mechanistic interpretability of large language models with applications to the financial services industry
Authors: Ashkan Golgoon, Khashayar Filom, Arjun Ravi Kannan,
Abstract summary: We are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial institutions, where concerns and accountability regarding bias, fairness, and reliability are of paramount importance. Mechanistic interpretability aims at reverse engineering complex AI models such as transformers. In this paper, we are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. We offer several examples of how algorithmic tasks can be designed for compliance monitoring purposes. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws. Using direct logit attribution, we study the contributions of each layer and its corresponding attention heads to the logit difference in the residual stream. Finally, we design clean and corrupted prompts and use activation patching as a causal intervention method to localize our task completion components further. We observe that the (positive) heads $10.2$ (head $2$, layer $10$), $10.7$, and $11.3$, as well as the (negative) heads $9.6$ and $10.6$ play a significant role in the task completion.

Related papers

Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions [51.43521977132062]
Money laundering is a financial crime that obscures the origin of illicit funds. The proliferation of mobile payment platforms and smart IoT devices has significantly complicated anti-money laundering investigations. This paper conducts a comprehensive review of deep learning solutions and the challenges associated with their use in AML.
arXiv Detail & Related papers (2025-03-13T05:19:44Z)
Transformers Use Causal World Models in Maze-Solving Tasks [49.67445252528868]
We identify World Models in transformers trained on maze-solving tasks. We find that it is easier to activate features than to suppress them. positional encoding schemes appear to influence how World Models are structured within the model's residual stream.
arXiv Detail & Related papers (2024-12-16T15:21:04Z)
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading [55.02735046724146]
In financial trading, factor models are widely used to price assets and capture excess returns from mispricing. We propose a Spatio-Temporal factOR Model based on dual vector quantized variational autoencoders, named STORM. Storm extracts features of stocks from temporal and spatial perspectives, then fuses and aligns these features at the fine-grained and semantic level, and represents the factors as multi-dimensional embeddings.
arXiv Detail & Related papers (2024-12-12T17:15:49Z)
Counting Ability of Large Language Models and Impact of Tokenization [17.53620419920189]
We study the impact of tokenization on the counting abilities of large language models (LLMs) Our work investigates the impact of tokenization on the counting abilities of LLMs, uncovering substantial performance variations based on input tokenization differences.
arXiv Detail & Related papers (2024-10-25T17:56:24Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
Multimodal Large Language Models to Support Real-World Fact-Checking [80.41047725487645]
Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
arXiv Detail & Related papers (2024-03-06T11:32:41Z)
Equipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance [10.859392781606623]
Large language models (LLMs) have exhibited an array of reasoning capabilities but face challenges like error propagation and hallucination. We explore the potential of language model augmentation with external tools to mitigate these limitations. We apply supervised fine-tuning on a LLaMA-2 13B Chat model to act both as a 'task router' and 'task solver'
arXiv Detail & Related papers (2024-01-27T07:08:37Z)
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI) To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z)
LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering [5.478764356647437]
LaundroGraph is a novel self-supervised graph representation learning approach. It provides insights to assist the anti-money laundering reviewing process. To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.
arXiv Detail & Related papers (2022-10-25T21:58:02Z)
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z)
VisBERT: Hidden-State Visualizations for Transformers [66.86452388524886]
We present VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings.
arXiv Detail & Related papers (2020-11-09T15:37:43Z)
Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification [33.026285180536036]
This paper proposes a novel methodology for producing plausible counterfactual explanations. It also explores the regularization benefits of adversarial training on language models in the domain of FinTech.
arXiv Detail & Related papers (2020-10-23T16:29:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.