Leveraging Large Language Models to Bridge On-chain and Off-chain Transparency in Stablecoins
- URL: http://arxiv.org/abs/2512.02418v1
- Date: Tue, 02 Dec 2025 05:00:17 GMT
- Title: Leveraging Large Language Models to Bridge On-chain and Off-chain Transparency in Stablecoins
- Authors: Yuexin Xiang, Yuchen Lei, SM Mahir Shazeed Rish, Yuanzhe Zhang, Qin Wang, Tsz Hon Yuen, Jiangshan Yu,
- Abstract summary: We introduce a large language model (LLM)-based automated framework that bridges verifiable on-chain traces and off-chain disclosures locked in unstructured text.<n>Our findings show that LLM-assisted analysis enhances cross-modal transparency and supports automated, data-driven auditing in decentralized finance.
- Score: 10.666951732730665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stablecoins such as USDT and USDC aspire to peg stability by coupling issuance controls with reserve attestations. In practice, however, the transparency is split across two worlds: verifiable on-chain traces and off-chain disclosures locked in unstructured text that are unconnected. We introduce a large language model (LLM)-based automated framework that bridges these two dimensions by aligning on-chain issuance data with off-chain disclosure statements. First, we propose an integrative framework using LLMs to capture and analyze on- and off-chain data through document parsing and semantic alignment, extracting key financial indicators from issuer attestations and mapping them to corresponding on-chain metrics. Second, we integrate multi-chain issuance records and disclosure documents within a model context protocol (MCP) framework that standardizes LLMs access to both quantitative market data and qualitative disclosure narratives. This framework enables unified retrieval and contextual alignment across heterogeneous stablecoin information sources and facilitates consistent analysis. Third, we demonstrate the capability of LLMs to operate across heterogeneous data modalities in blockchain analytics, quantifying discrepancies between reported and observed circulation and examining their implications for cross-chain transparency and price dynamics. Our findings reveal systematic gaps between disclosed and verifiable data, showing that LLM-assisted analysis enhances cross-modal transparency and supports automated, data-driven auditing in decentralized finance (DeFi).
Related papers
- Document Data Matching for Blockchain-Supported Real Estate [2.9873162504735133]
This work presents a system that integrates optical character recognition (OCR), natural language processing (NLP), and verifiable credentials (VCs) to automate document extraction, verification, and management.<n>The approach standardizes heterogeneous document formats into VCs and applies automated data matching to detect inconsistencies, while the blockchain provides a decentralized trust layer that reinforces transparency and integrity.<n>The proposed framework demonstrates the potential to streamline real estate transactions, strengthen stakeholder trust, and enable scalable, secure digital processes.
arXiv Detail & Related papers (2025-12-30T20:30:48Z) - RiskTagger: An LLM-based Agent for Automatic Annotation of Web3 Crypto Money Laundering Behaviors [65.80108147440863]
RiskTagger is a large-language-model-based agent for the automatic annotation of crypto laundering behaviors in Web3.<n>RiskTagger is designed to replace or complement human annotators by addressing three key challenges: extracting clues from complex unstructured reports, reasoning over multichain transaction paths, and producing auditor-friendly explanations.
arXiv Detail & Related papers (2025-10-12T08:54:28Z) - Transaction Profiling and Address Role Inference in Tokenized U.S. Treasuries [5.00898007095729]
Tokenized U.S. Treasuries have emerged as a prominent subclass of real-world assets (RWAs)<n>This paper conducts a quantitative, function-level dissection of U.S. Treasury-backed RWA tokens across multi-chain networks.
arXiv Detail & Related papers (2025-07-20T03:54:06Z) - Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach [55.861432910722186]
UniToCom is a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission.<n>We propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information.<n>We employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens.
arXiv Detail & Related papers (2025-07-02T14:03:01Z) - Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions [60.43398881149664]
We introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LLM Output Signature.<n>It achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency.
arXiv Detail & Related papers (2025-03-18T09:04:37Z) - Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Fraud Detection [0.7510165488300369]
We propose a dynamic feature fusion model that combines graph-based representation learning and semantic feature extraction for fraud detection.<n>We develop a comprehensive data processing pipeline, including graph construction, temporal feature enhancement, and text preprocessing.<n> Experimental results on large-scale real-world blockchain datasets demonstrate that our method outperforms existing benchmarks across accuracy, F1 score, and recall metrics.
arXiv Detail & Related papers (2025-01-03T09:04:43Z) - Blockchain Data Analysis in the Era of Large-Language Models [21.81035847078574]
Existing blockchain data analysis tools face challenges, including data scarcity, the lack of generalizability, and the lack of reasoning capability.<n>We believe large language models (LLMs) can mitigate these challenges.<n>This paper systematically explores potential techniques and design patterns in LLM-integrated blockchain data analysis.
arXiv Detail & Related papers (2024-12-09T07:32:35Z) - FinML-Chain: A Blockchain-Integrated Dataset for Enhanced Financial Machine Learning [2.0695662173473206]
We present a framework for integrating high-frequency on-chain data with low-frequency off-chain data.
This framework generates modular datasets for analyzing economic mechanisms such as the Transaction Fee Mechanism.
We demonstrate the framework's ability to produce datasets that advance financial research and improve understanding of blockchain-driven systems.
arXiv Detail & Related papers (2024-11-25T10:55:11Z) - Transforming Triple-Entry Accounting with Machine Learning: A Path to Enhanced Transparency Through Analytics [0.0]
Triple Entry (TE) accounting could help improve transparency in complex financial and supply chain transactions such as blockchain.
Machine learning (ML) presents a promising avenue to augment the transparency advantages of TE accounting.
By automating some of the data collection and analysis needed for TE bookkeeping, ML techniques have the potential to make this more transparent.
arXiv Detail & Related papers (2024-11-19T08:58:44Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.