A Scientific Data Integrity system based on Blockchain
- URL: http://arxiv.org/abs/2601.13425v1
- Date: Mon, 19 Jan 2026 22:09:52 GMT
- Title: A Scientific Data Integrity system based on Blockchain
- Authors: Gian Sebastian Mier Bello, Alexander Martinez Mendez, Carlos J. Barrios H., Robinson Rivas, Luis A. Núñez,
- Abstract summary: We present a novel approach to help research groups to validate data integrity on such distributed repositories.<n>Our proposal ensures 1) secure access to data management, 2) easy validation of data integrity, and 3) an easy way to add new records to the dataset with the same robust integrity policy.
- Score: 36.94429692322632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In most High Performance Computing (HPC) projects nowadays, there is a lot of data obtained from different sources, depending on the project's objectives. Some of that data is very huge in terms of size, so copying such data sometimes is an unrealistic goal. On the other hand, science requires data used for different purposes to remain unaltered, so different groups of researchers can reproduce results, discuss theories, and validate each other. In this paper, we present a novel approach to help research groups to validate data integrity on such distributed repositories using Blockchain. Originally developed for cryptographic currencies, Blockchain has demonstrated a versatile range of uses. Our proposal ensures 1) secure access to data management, 2) easy validation of data integrity, and 3) an easy way to add new records to the dataset with the same robust integrity policy. A prototype was developed and tested using a subset of a public dataset from a real scientific collaboration, the Latin American Giant Observatory (LAGO) Project.
Related papers
- Decentralized COVID-19 Health System Leveraging Blockchain [0.8225825738565354]
This paper takes the most common COVID-19 as the application scenario and designs a COVID-19 health system based on blockchain.<n>Considering that the public and transparent nature of blockchain violates the privacy requirements of some health data, in the system design stage, the data is divided into public data and private data.<n>In the system implementation part, based on the Hyperledger Fabric architecture, some functions of the system design are realized, including data upload, retrieval of the latest data and historical data.
arXiv Detail & Related papers (2025-06-03T09:19:47Z) - XChainDataGen: A Cross-Chain Dataset Generation Framework [6.139772633069047]
This paper proposes XChainDataGen, a tool to extract cross-chain data from blockchains and generate datasets of cross-chain transactions (cctxs)<n>Using XChainDataGen, we extracted over 35 GB of data from five cross-chain protocols deployed on 11 blockchains in the last seven months of 2024.<n>We identify 11,285,753 cctxs that moved over 28 billion USD in cross-chain token transfers.
arXiv Detail & Related papers (2025-03-17T18:39:43Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Trustless Privacy-Preserving Data Aggregation on Ethereum with Hypercube Network Topology [0.0]
We have proposed a scalable privacy-preserving data aggregation protocol for summation on the blockchain.
The protocol consists of four stages as contract deployment, user registration, private submission and proof verification.
arXiv Detail & Related papers (2023-08-29T12:51:26Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - Quantinar: a blockchain p2p ecosystem for honest scientific research [0.0]
Peer-to-Peer (P2P) ecosystem based on a blockchain network, Quantinar (quantinar.com)
We propose the use of a Peer-to-Peer (P2P) ecosystem based on a blockchain network, Quantinar (quantinar.com)
arXiv Detail & Related papers (2022-11-13T11:28:04Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Analysis of Models for Decentralized and Collaborative AI on Blockchain [0.0]
We evaluate the use of several models and configurations in order to propose best practices when using the Self-Assessment incentive mechanism.
We compare several factors for each dataset when models are hosted in smart contracts on a public blockchain.
arXiv Detail & Related papers (2020-09-14T21:38:55Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.