BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern
Analysis
- URL: http://arxiv.org/abs/2204.05746v1
- Date: Sun, 10 Apr 2022 06:46:51 GMT
- Title: BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern
Analysis
- Authors: Yuexin Xiang, Wei Ren, Hang Gao, Ding Bao, Yuchen Lei, Tiantian Li,
Qingqing Yang, Wenmao Liu, Tianqing Zhu, and Kim-Kwang Raymond Choo
- Abstract summary: We build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021.
This dataset contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data.
We use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost.
- Score: 36.42552617883664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cryptocurrencies are no longer just the preferred option for cybercriminal
activities on darknets, due to the increasing adoption in mainstream
applications. This is partly due to the transparency associated with the
underpinning ledgers, where any individual can access the record of a
transaction record on the public ledger. In this paper, we build a dataset
comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This
dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin
addresses, 5 categories of indicators with 148 features, and 544,462 labeled
data. We then use our proposed dataset on common machine learning models,
namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer
perceptron, and XGBoost. The results show that the accuracy rates of these
machine learning models on our proposed dataset are between 93.24% and 96.71%.
We also analyze the proposed features and their relationships from the
experiments, and propose a k-hop subgraph generation algorithm to extract a
k-hop subgraph from the entire Bitcoin transaction graph constructed by the
directed heterogeneous multigraph starting from a specific Bitcoin address node
(e.g., a known transaction associated with a criminal investigation).
Related papers
- Bitcoin Research with a Transaction Graph Dataset [19.66391887460672]
This paper introduces a large scale dataset in the form of a transactions graph representing transactions between Bitcoin users.
The graph includes 252 million nodes and 785 million edges, covering a time span of nearly 13 years of and 670 million transactions.
Various graph neural network models are trained to predict node labels, establishing a baseline for future research.
arXiv Detail & Related papers (2024-11-15T16:28:03Z) - ORBITAAL: A Temporal Graph Dataset of Bitcoin Entity-Entity Transactions [0.0]
ORBITAAL is the first comprehensive dataset based on temporal graph formalism.
The dataset covers all Bitcoin transactions from January 2009 to January 2021.
This dataset also provides details on entities such as their global BTC balance and associated public addresses.
arXiv Detail & Related papers (2024-08-26T09:48:45Z) - Transaction Fraud Detection via an Adaptive Graph Neural Network [64.9428588496749]
We propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection.
A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes.
Experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
arXiv Detail & Related papers (2023-07-11T07:48:39Z) - Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin
Network for Financial Forensics [8.97719386315469]
This paper presents a holistic applied data science approach to fraud detection in the Bitcoin network.
First, we contribute the Elliptic++ dataset, which extends the Elliptic transaction dataset to include over 822k Bitcoin wallet addresses (nodes)
Second, we perform fraud detection tasks on all four graphs by using diverse machine learning algorithms.
arXiv Detail & Related papers (2023-05-25T18:36:54Z) - Chainlet Orbits: Topological Address Embedding for the Bitcoin
Blockchain [15.099255988459602]
Rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities.
We introduce an effective solution called Chainlet Orbits to embed Bitcoin addresses by leveraging their topological characteristics in transactions.
Our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.
arXiv Detail & Related papers (2023-05-18T21:16:59Z) - Blockchain Large Language Models [65.7726590159576]
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions.
The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System.
arXiv Detail & Related papers (2023-04-25T11:56:18Z) - Demystifying Bitcoin Address Behavior via Graph Neural Networks [20.002509270755443]
BAClassifier is a tool that can automatically classify bitcoin addresses based on their behaviors.
We construct and release a large-scale annotated dataset that consists of over 2 million real-world bitcoin addresses.
arXiv Detail & Related papers (2022-11-26T14:55:50Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - Comprehensive Graph-conditional Similarity Preserving Network for
Unsupervised Cross-modal Hashing [97.44152794234405]
Unsupervised cross-modal hashing (UCMH) has become a hot topic recently.
In this paper, we devise a deep graph-neighbor coherence preserving network (DGCPN)
DGCPN regulates comprehensive similarity preserving losses by exploiting three types of data similarities.
arXiv Detail & Related papers (2020-12-25T07:40:59Z) - Inverse Graph Identification: Can We Identify Node Labels Given Graph
Labels? [89.13567439679709]
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications.
This paper defines a novel problem dubbed Inverse Graph Identification (IGI)
We propose a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI.
arXiv Detail & Related papers (2020-07-12T12:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.