Related papers: Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

URL: http://arxiv.org/abs/2501.18158v3
Date: Thu, 04 Sep 2025 07:46:37 GMT
Title: Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study
Authors: Yuchen Lei, Yuexin Xiang, Qin Wang, Rafael Dowsley, Tsz Hon Yuen, Kim-Kwang Raymond Choo, Jiangshan Yu,
Abstract summary: Large language models (LLMs) have the potential to address these gaps, but their capabilities in this area remain largely unexplored.<n>In this paper, we test this hypothesis by applying LLMs to real-world cryptocurrency transaction graphs.<n>This includes a new, human-readable graph representation format, LLM4TG, and a connectivity-enhanced transaction graph sampling algorithm, CETraS.
Score: 31.329503543552864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cryptocurrencies are widely used, yet current methods for analyzing transactions often rely on opaque, black-box models. While these models may achieve high performance, their outputs are usually difficult to interpret and adapt, making it challenging to capture nuanced behavioral patterns. Large language models (LLMs) have the potential to address these gaps, but their capabilities in this area remain largely unexplored, particularly in cybercrime detection. In this paper, we test this hypothesis by applying LLMs to real-world cryptocurrency transaction graphs, with a focus on Bitcoin, one of the most studied and widely adopted blockchain networks. We introduce a three-tiered framework to assess LLM capabilities: foundational metrics, characteristic overview, and contextual interpretation. This includes a new, human-readable graph representation format, LLM4TG, and a connectivity-enhanced transaction graph sampling algorithm, CETraS. Together, they significantly reduce token requirements, transforming the analysis of multiple moderately large-scale transaction graphs with LLMs from nearly impossible to feasible under strict token limits. Experimental results demonstrate that LLMs have outstanding performance on foundational metrics and characteristic overview, where the accuracy of recognizing most basic information at the node level exceeds 98.50% and the proportion of obtaining meaningful characteristics reaches 95.00%. Regarding contextual interpretation, LLMs also demonstrate strong performance in classification tasks, even with very limited labeled data, where top-3 accuracy reaches 72.43% with explanations. While the explanations are not always fully accurate, they highlight the strong potential of LLMs in this domain. At the same time, several limitations persist, which we discuss along with directions for future research.

Related papers

Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection [17.04809129025246]
FinFRE-RAG is a two-stage approach that applies importance-guided feature reduction to serialize a compact subset of numeric/categorical attributes into natural language.<n>LLMs can produce human-readable explanations and facilitate feature analysis, potentially reducing the manual workload of fraud analysts.
arXiv Detail & Related papers (2025-12-15T07:09:11Z)
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography [13.643089244089873]
We present CryptoQA, the first large-scale question-answering dataset specifically designed for cryptography.<n>We benchmark 15 state-of-the-art LLMs on CryptoQA, evaluating their factual accuracy, mathematical reasoning, consistency, referencing, and robustness to adversarial samples.<n>Our results reveal significant performance deficits of LLMs, particularly on tasks that require formal reasoning and precise mathematical knowledge.
arXiv Detail & Related papers (2025-12-02T10:35:36Z)
CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency [60.83660377169452]
This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate the real-world capabilities of Large Language Model (LLM) agents.<n>Unlike general-purpose agent benchmarks for search and prediction, professional crypto analysis presents specific challenges.
arXiv Detail & Related papers (2025-11-29T09:52:34Z)
LMAE4Eth: Generalizable and Robust Ethereum Fraud Detection by Exploring Transaction Semantics and Masked Graph Embedding [10.923718297125754]
LMAE4Eth is a multi-view learning framework that fuses transaction semantics, masked graph embedding, and expert knowledge.<n>We first propose a transaction-token contrastive language model (TxCLM) that transforms context-independent numerical transaction records into cohesive linguistic representations.<n>We then propose a masked account graph autoencoder (MAGAE) using generative self-supervised learning, which achieves superior node-level account detection.
arXiv Detail & Related papers (2025-09-04T06:56:32Z)
Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding [6.0158981171030685]
This paper presents a comprehensive evaluation of the capabilities of Large Language Models (LLMs) in metaphor interpretation across multiple datasets, tasks, and prompt configurations.<n>We address these limitations by conducting extensive experiments using diverse publicly available datasets with inference and metaphor annotations.<n>The results indicate that LLMs' performance is more influenced by features like lexical overlap and sentence length than by metaphorical content.
arXiv Detail & Related papers (2025-07-21T08:09:11Z)
Machine Learning-Based Detection and Analysis of Suspicious Activities in Bitcoin Wallet Transactions in the USA [1.588234879488451]
The study aims to create a model with a feature for identifying trends and outliers that can expose illicit activity. The dataset is composed of in-depth Bitcoin wallet transactional information. The application of machine algorithms in tracking cryptocurrencies is a tool for creating transparent and secure U.S. markets.
arXiv Detail & Related papers (2025-04-04T00:07:32Z)
Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation [26.19182768810174]
Graph-structured data has become increasingly prevalent across various domains, raising the demand for effective models to handle graph tasks.<n>Traditional graph learning models like Graph Neural Networks (GNNs) have made significant strides, but their capabilities in handling graph data remain limited in certain contexts.<n>In recent years, large language models (LLMs) have emerged as promising candidates for graph tasks, yet most studies focus primarily on performance benchmarks.
arXiv Detail & Related papers (2025-02-26T03:03:46Z)
Training Large Recommendation Models via Graph-Language Token Alignment [53.3142545812349]
We propose a novel framework to train Large Recommendation models via Graph-Language Token Alignment. By aligning item and user nodes from the interaction graph with pretrained LLM tokens, GLTA effectively leverages the reasoning abilities of LLMs. Furthermore, we introduce Graph-Language Logits Matching (GLLM) to optimize token alignment for end-to-end item prediction.
arXiv Detail & Related papers (2025-02-26T02:19:10Z)
Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents [69.58565132975504]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks.<n>We present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading.
arXiv Detail & Related papers (2025-02-25T08:41:01Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models [41.233879429714925]
This study critically examines the capacity of API-based large language models to comprehend phrase semantics. We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions. We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics.
arXiv Detail & Related papers (2024-10-03T08:44:17Z)
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. This study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z)
Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)<n>We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.<n>Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z)
Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings. Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z)
Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z)
Large Language Models Can Learn Temporal Reasoning [11.599570446840547]
We propose TG-LLM, a novel framework towards language-based temporal reasoning. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) A synthetic dataset (TGQA) is fully controllable and requires minimal supervision.
arXiv Detail & Related papers (2024-01-12T19:00:26Z)
IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations [11.008412414253662]
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs. We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
arXiv Detail & Related papers (2023-06-24T05:02:34Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Cryptocurrencies Activity as a Complex Network: Analysis of Transactions Graphs [7.58432869763351]
We analyze the flow of these digital transactions in a certain period of time by studying, through complex network theory, the patterns of interactions in four prominent and different Distributed Ledger Technologies (DLTs) We show that studying the network characteristics and peculiarities is of paramount importance, in order to understand how users interact in the DLT.
arXiv Detail & Related papers (2021-09-14T08:32:36Z)
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines. In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.