Detecting Rug Pulls in Decentralized Exchanges: Machine Learning Evidence from the TON Blockchain
- URL: http://arxiv.org/abs/2509.01168v1
- Date: Mon, 01 Sep 2025 06:39:50 GMT
- Title: Detecting Rug Pulls in Decentralized Exchanges: Machine Learning Evidence from the TON Blockchain
- Authors: Dmitry Yaremus, Jianghai Li, Alisa Kalacheva, Igor Vodolazov, Yury Yanovich,
- Abstract summary: This paper presents a machine learning framework for the early detection of rug pull scams on decentralized exchanges (DEXs) within The Open Network (TON) blockchain.<n>We conduct a comprehensive study on the two largest TON DEXs, Ston.Fi and DeDust, fusing data from both platforms to train our models.<n>We demonstrate that Gradient Boosting models can effectively identify rug pulls within the first five minutes of trading, with the TVL-based method achieving superior AUC (up to 0.891) while the idle-based method excels at recall.
- Score: 0.1522374059398944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a machine learning framework for the early detection of rug pull scams on decentralized exchanges (DEXs) within The Open Network (TON) blockchain. TON's unique architecture, characterized by asynchronous execution and a massive web2 user base from Telegram, presents a novel and critical environment for fraud analysis. We conduct a comprehensive study on the two largest TON DEXs, Ston.Fi and DeDust, fusing data from both platforms to train our models. A key contribution is the implementation and comparative analysis of two distinct rug pull definitions--TVL-based (a catastrophic liquidity withdrawal) and idle-based (a sudden cessation of all trading activity)--within a single, unified study. We demonstrate that Gradient Boosting models can effectively identify rug pulls within the first five minutes of trading, with the TVL-based method achieving superior AUC (up to 0.891) while the idle-based method excels at recall. Our analysis reveals that while feature sets are consistent across exchanges, their underlying distributions differ significantly, challenging straightforward data fusion and highlighting the need for robust, platform-aware models. This work provides a crucial early-warning mechanism for investors and enhances the security infrastructure of the rapidly growing TON DeFi ecosystem.
Related papers
- Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning [106.68304931854038]
Reinforcement learning with verifiable rewards (RLVR) has been widely used for enhancing the reasoning abilities of large language models (LLMs)<n>We conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RLVR across different levels of granularity.<n>Our analysis reveals that, in the rising stage, entropy reduction in negative samples facilitates the learning of effective reasoning patterns.<n>In the plateau stage, learning efficiency strongly correlates with high-entropy tokens present in low-perplexity samples and those located at the end of sequences.
arXiv Detail & Related papers (2025-08-04T10:08:10Z) - Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Fraud Detection [0.7510165488300369]
We propose a dynamic feature fusion model that combines graph-based representation learning and semantic feature extraction for fraud detection.<n>We develop a comprehensive data processing pipeline, including graph construction, temporal feature enhancement, and text preprocessing.<n> Experimental results on large-scale real-world blockchain datasets demonstrate that our method outperforms existing benchmarks across accuracy, F1 score, and recall metrics.
arXiv Detail & Related papers (2025-01-03T09:04:43Z) - AI-Powered Energy Algorithmic Trading: Integrating Hidden Markov Models with Neural Networks [0.0]
This study introduces a new approach that combines Hidden Markov Models (HMM) and neural networks, integrated with Black-Litterman portfolio optimization.
During the COVID period ( 2019-2022), this dual-model approach achieved a 83% return with a Sharpe ratio of 0.77.
arXiv Detail & Related papers (2024-07-29T10:26:52Z) - A Dataset of Uniswap daily transaction indices by network [1.8291790356553643]
Decentralized Finance (DeFi) is reshaping traditional finance by enabling direct transactions without intermediaries.
Layer 2 (L2) solutions are emerging to enhance the scalability and efficiency of the DeFi ecosystem, surpassing Layer 1 (L1) systems.
This study bridges that gap by analyzing over 50 million transactions from Uniswap, a major decentralized exchange, across both L1 and L2 networks.
arXiv Detail & Related papers (2023-12-05T10:53:46Z) - Secure Decentralized Learning with Blockchain [13.795131629462798]
Federated Learning (FL) is a well-known paradigm of distributed machine learning on mobile and IoT devices.
To avoid the single point of failure problem in FL, decentralized learning (DFL) has been proposed to use peer-to-peer communication for model aggregation.
arXiv Detail & Related papers (2023-10-10T23:45:17Z) - Defending Against Poisoning Attacks in Federated Learning with
Blockchain [12.840821573271999]
We propose a secure and reliable federated learning system based on blockchain and distributed ledger technology.
Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors.
arXiv Detail & Related papers (2023-07-02T11:23:33Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - Blockchain Assisted Decentralized Federated Learning (BLADE-FL):
Performance Analysis and Resource Allocation [119.19061102064497]
We propose a decentralized FL framework by integrating blockchain into FL, namely, blockchain assisted decentralized federated learning (BLADE-FL)
In a round of the proposed BLADE-FL, each client broadcasts its trained model to other clients, competes to generate a block based on the received models, and then aggregates the models from the generated block before its local training of the next round.
We explore the impact of lazy clients on the learning performance of BLADE-FL, and characterize the relationship among the optimal K, the learning parameters, and the proportion of lazy clients.
arXiv Detail & Related papers (2021-01-18T07:19:08Z) - DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and
Feature Selection for Financial Data Analysis [22.035287788330663]
We propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection.
Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction.
arXiv Detail & Related papers (2020-10-03T02:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.