Mechanistic interpretability of large language models with applications to the financial services industry
- URL: http://arxiv.org/abs/2407.11215v2
- Date: Wed, 16 Oct 2024 02:40:53 GMT
- Title: Mechanistic interpretability of large language models with applications to the financial services industry
- Authors: Ashkan Golgoon, Khashayar Filom, Arjun Ravi Kannan,
- Abstract summary: We are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications.
In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial institutions, where concerns and accountability regarding bias, fairness, and reliability are of paramount importance. Mechanistic interpretability aims at reverse engineering complex AI models such as transformers. In this paper, we are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. We offer several examples of how algorithmic tasks can be designed for compliance monitoring purposes. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws. Using direct logit attribution, we study the contributions of each layer and its corresponding attention heads to the logit difference in the residual stream. Finally, we design clean and corrupted prompts and use activation patching as a causal intervention method to localize our task completion components further. We observe that the (positive) heads $10.2$ (head $2$, layer $10$), $10.7$, and $11.3$, as well as the (negative) heads $9.6$ and $10.6$ play a significant role in the task completion.
Related papers
- Counting Ability of Large Language Models and Impact of Tokenization [17.53620419920189]
We study the impact of tokenization on the counting abilities of large language models (LLMs)
Our work investigates the impact of tokenization on the counting abilities of LLMs, uncovering substantial performance variations based on input tokenization differences.
arXiv Detail & Related papers (2024-10-25T17:56:24Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Multimodal Large Language Models to Support Real-World Fact-Checking [80.41047725487645]
Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information.
While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied.
We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
arXiv Detail & Related papers (2024-03-06T11:32:41Z) - Equipping Language Models with Tool Use Capability for Tabular Data
Analysis in Finance [10.859392781606623]
Large language models (LLMs) have exhibited an array of reasoning capabilities but face challenges like error propagation and hallucination.
We explore the potential of language model augmentation with external tools to mitigate these limitations.
We apply supervised fine-tuning on a LLaMA-2 13B Chat model to act both as a 'task router' and 'task solver'
arXiv Detail & Related papers (2024-01-27T07:08:37Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - LaundroGraph: Self-Supervised Graph Representation Learning for
Anti-Money Laundering [5.478764356647437]
LaundroGraph is a novel self-supervised graph representation learning approach.
It provides insights to assist the anti-money laundering reviewing process.
To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.
arXiv Detail & Related papers (2022-10-25T21:58:02Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - VisBERT: Hidden-State Visualizations for Transformers [66.86452388524886]
We present VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering.
VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings.
arXiv Detail & Related papers (2020-11-09T15:37:43Z) - Generating Plausible Counterfactual Explanations for Deep Transformers
in Financial Text Classification [33.026285180536036]
This paper proposes a novel methodology for producing plausible counterfactual explanations.
It also explores the regularization benefits of adversarial training on language models in the domain of FinTech.
arXiv Detail & Related papers (2020-10-23T16:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.