From Indexing to Coding: A New Paradigm for Data Availability Sampling
- URL: http://arxiv.org/abs/2509.21586v1
- Date: Thu, 25 Sep 2025 20:59:35 GMT
- Title: From Indexing to Coding: A New Paradigm for Data Availability Sampling
- Authors: Moritz Grundei, Aayush Rajasekaran, Kishori Konwar, Muriel Medard,
- Abstract summary: We introduce a new approach to DAS that modularizes the coding and commitment process by committing to the uncoded data while performing sampling through on-the-fly coding.<n>The resulting samples are significantly more expressive, enabling light nodes to obtain, in concrete implementations, up to multiple orders of magnitude stronger assurances of data availability.
- Score: 0.37331950863394864
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The data availability problem is a central challenge in blockchain systems and lies at the core of the accessibility and scalability issues faced by platforms such as Ethereum. Modern solutions employ several approaches, with data availability sampling (DAS) being the most self-sufficient and minimalistic in its security assumptions. Existing DAS methods typically form cryptographic commitments on codewords of fixed-rate erasure codes, which restrict light nodes to sampling from a predetermined set of coded symbols. In this paper, we introduce a new approach to DAS that modularizes the coding and commitment process by committing to the uncoded data while performing sampling through on-the-fly coding. The resulting samples are significantly more expressive, enabling light nodes to obtain, in concrete implementations, up to multiple orders of magnitude stronger assurances of data availability than from sampling pre-committed symbols from a fixed-rate redundancy code as done in established DAS schemes using Reed Solomon or low density parity check codes. We present a concrete protocol that realizes this paradigm using random linear network coding (RLNC).
Related papers
- Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding [58.92526489742584]
We propose provably lossless.<n> verification method that significantly boosts the expected number of accepted tokens.<n>We show that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks.
arXiv Detail & Related papers (2026-01-09T11:10:29Z) - From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models [19.97248408121574]
Diffusion Language Models (DLMs) offer comparable accuracy with faster inference speed via parallel decoding.<n>High-confidence tokens carry negligible information and strictly relying on them limits the effective progress made in each decoding round.<n>We propose Explore-Then-Exploit (ETE), a training-free decoding strategy that maximizes information throughput and decoding efficiency.
arXiv Detail & Related papers (2025-11-26T06:38:37Z) - Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions [60.43398881149664]
We introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LLM Output Signature.<n>It achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency.
arXiv Detail & Related papers (2025-03-18T09:04:37Z) - KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data.<n>It comprises question-solution-test triplets that are systematically validated via a self-verification procedure.<n>This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z) - Decoding for Punctured Convolutional and Turbo Codes: A Deep Learning Solution for Protocols Compliance [22.85778198575678]
This paper presents a unified Long Short-Term Memory (LSTM)-based decoding architecture.<n>The proposed method unifies punctured convolutional and Turbo codes.<n>A puncture embedding mechanism integrates puncturing patterns directly into the network, enabling seamless adaptation to varying code rates.
arXiv Detail & Related papers (2025-02-21T14:00:14Z) - Threshold Selection for Iterative Decoding of $(v,w)$-regular Binary Codes [84.0257274213152]
Iterative bit flipping decoders are an efficient choice for sparse $(v,w)$-regular codes.<n>We propose concrete criteria for threshold determination, backed by a closed form model.
arXiv Detail & Related papers (2025-01-23T17:38:22Z) - Error Correction Code Transformer: From Non-Unified to Unified [40.96008445182154]
Traditional decoders were typically designed as fixed hardware circuits tailored to specific decoding algorithms.<n>This paper proposes a unified, code-agnostic Transformer-based decoding architecture capable of handling multiple linear block codes.
arXiv Detail & Related papers (2024-10-04T12:30:42Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - Coding-Based Hybrid Post-Quantum Cryptosystem for Non-Uniform Information [53.85237314348328]
We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC)
We show that NU-HUNCC is information-theoretic individually secured against an eavesdropper with access to any subset of the links.
arXiv Detail & Related papers (2024-02-13T12:12:39Z) - DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel [8.092763074723594]
A novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding.<n>A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences.<n>This embedding space serves as a proxy for the complex Levenshtein domain within which algorithms for codeword search and segment correcting is developed.
arXiv Detail & Related papers (2023-12-20T02:22:54Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - A Learning-Based Approach to Address Complexity-Reliability Tradeoff in
OS Decoders [32.35297363281744]
We show that using artificial neural networks to predict the required order of an ordered statistics based decoder helps in reducing the average complexity and hence the latency of the decoder.
arXiv Detail & Related papers (2021-03-05T18:22:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.