FinML-Chain: A Blockchain-Integrated Dataset for Enhanced Financial Machine Learning
- URL: http://arxiv.org/abs/2411.16277v1
- Date: Mon, 25 Nov 2024 10:55:11 GMT
- Title: FinML-Chain: A Blockchain-Integrated Dataset for Enhanced Financial Machine Learning
- Authors: Jingfeng Chen, Wanlin Deng, Dangxing Chen, Luyao Zhang,
- Abstract summary: We present a framework for integrating high-frequency on-chain data with low-frequency off-chain data.
This framework generates modular datasets for analyzing economic mechanisms such as the Transaction Fee Mechanism.
We demonstrate the framework's ability to produce datasets that advance financial research and improve understanding of blockchain-driven systems.
- Score: 2.0695662173473206
- License:
- Abstract: Machine learning is critical for innovation and efficiency in financial markets, offering predictive models and data-driven decision-making. However, challenges such as missing data, lack of transparency, untimely updates, insecurity, and incompatible data sources limit its effectiveness. Blockchain technology, with its transparency, immutability, and real-time updates, addresses these challenges. We present a framework for integrating high-frequency on-chain data with low-frequency off-chain data, providing a benchmark for addressing novel research questions in economic mechanism design. This framework generates modular, extensible datasets for analyzing economic mechanisms such as the Transaction Fee Mechanism, enabling multi-modal insights and fairness-driven evaluations. Using four machine learning techniques, including linear regression, deep neural networks, XGBoost, and LSTM models, we demonstrate the framework's ability to produce datasets that advance financial research and improve understanding of blockchain-driven systems. Our contributions include: (1) proposing a research scenario for the Transaction Fee Mechanism and demonstrating how the framework addresses previously unexplored questions in economic mechanism design; (2) providing a benchmark for financial machine learning by open-sourcing a sample dataset generated by the framework and the code for the pipeline, enabling continuous dataset expansion; and (3) promoting reproducibility, transparency, and collaboration by fully open-sourcing the framework and its outputs. This initiative supports researchers in extending our work and developing innovative financial machine-learning models, fostering advancements at the intersection of machine learning, blockchain, and economics.
Related papers
- A Novel Framework for Analyzing Structural Transformation in Data-Constrained Economies Using Bayesian Modeling and Machine Learning [0.0]
The shift from agrarian economies to more diversified industrial and service-based systems is a key driver of economic development.
In low- and middle-income countries (LMICs), data scarcity and unreliability hinder accurate assessments of this process.
This paper presents a novel statistical framework designed to address these challenges by integrating Bayesian hierarchical modeling, machine learning-based data imputation, and factor analysis.
arXiv Detail & Related papers (2024-09-25T08:39:41Z) - Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach [16.31469678670097]
We introduce Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata.
We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata.
arXiv Detail & Related papers (2024-08-16T05:34:50Z) - Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies.
This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z) - Enhancing supply chain security with automated machine learning [2.994117664413568]
This study tackles the complexities of global supply chains, which are increasingly vulnerable to disruptions caused by port congestion, material shortages, and inflation.
Our focus is on enhancing supply chain security through fraud detection, maintenance prediction, and material backorder forecasting.
By automating these processes, our framework improves the efficiency and effectiveness of supply chain security measures.
arXiv Detail & Related papers (2024-06-19T02:45:32Z) - Machine Learning for Blockchain Data Analysis: Progress and Opportunities [9.07520594836878]
blockchain datasets encompass multiple layers of interactions across real-world entities, e.g., human users, autonomous programs, and smart contracts.
These unique characteristics present both opportunities and challenges for machine learning on blockchain data.
This paper serves as a comprehensive resource for researchers, practitioners, and policymakers, offering a roadmap for navigating this dynamic and transformative field.
arXiv Detail & Related papers (2024-04-28T17:18:08Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - INTERN: A New Learning Paradigm Towards General Vision [117.3343347061931]
We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
arXiv Detail & Related papers (2021-11-16T18:42:50Z) - Data-Centric Engineering: integrating simulation, machine learning and
statistics. Challenges and Opportunities [1.3535770763481905]
Recent advances in machine learning, coupled with low-cost computation, have led to widespread multi-disciplinary research activity.
Mechanistic models, based on physical equations, and purely data-driven statistical approaches represent two ends of the modelling spectrum.
New hybrid, data-centric engineering approaches, leveraging the best of both worlds and integrating both simulations and data, are emerging as a powerful tool.
arXiv Detail & Related papers (2021-11-07T22:31:23Z) - Multi Agent System for Machine Learning Under Uncertainty in Cyber
Physical Manufacturing System [78.60415450507706]
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing.
Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it.
In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty.
arXiv Detail & Related papers (2021-07-28T10:28:05Z) - Multilinear Compressive Learning with Prior Knowledge [106.12874293597754]
Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system.
Key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task.
In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable?
arXiv Detail & Related papers (2020-02-17T19:06:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.