Related papers: DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

URL: http://arxiv.org/abs/2312.12717v2
Date: Fri, 26 Sep 2025 14:01:26 GMT
Title: DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel
Authors: Alan J. X. Guo, Sihan Sun, Xiang Wei, Mengyi Wei, Xin Chen,
Abstract summary: A novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding.<n>A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences.<n>This embedding space serves as a proxy for the complex Levenshtein domain within which algorithms for codeword search and segment correcting is developed.
Score: 8.092763074723594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the emergence of new storage and communication methods, the insertion, deletion, and substitution (IDS) channel has attracted considerable attention. However, many topics on the IDS channel and the associated Levenshtein distance remain open, making the invention of a novel IDS-correcting code a hard task. Furthermore, current studies on single-IDS-correcting code misalign with the requirements of applications which necessitates the correcting of multiple errors. Compromise solutions have involved shortening codewords to reduce the chance of multiple errors. However, the code rates of existing codes are poor at short lengths, diminishing the overall storage density. In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. While the concept underpinning this approach is straightforward, it bypasses the mathematical challenges typically encountered in code design. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords.

Related papers

Tradeoffs on the volume of fault-tolerant circuits [8.594140167290096]
Error correcting codes can overcome faulty circuit components to enable robust computation.<n>Choosing an appropriate code is non-trivial as it must balance several requirements.<n>Code family cannot simultaneously have constant rate, growing distance and short-depth gadgets to perform encoded CNOT gates.
arXiv Detail & Related papers (2025-10-03T14:39:31Z)
Rectified Sparse Attention [61.7702154360081]
Efficient long-sequence generation is a critical challenge for Large Language Models.<n>We propose Rectified Sparse Attention (ReSA), a simple yet effective method that combines block-sparse attention with periodic dense rectification.<n> Experiments across math reasoning, language modeling, and retrieval tasks demonstrate that ReSA achieves near-lossless generation quality.
arXiv Detail & Related papers (2025-06-04T16:01:48Z)
Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes [1.53119329713143]
Varshamov-Tenengolts (VT) codes, primarily designed for single-error correction, have emerged as a central research focus.<n>While existing decoding methods achieve high accuracy in correcting a single error, they often fail to correct multiple IDS errors.<n>In this work, we observe that VT codes retain some capability for addressing multiple errors by introducing a transformer-based VT decoder.
arXiv Detail & Related papers (2025-02-28T13:59:14Z)
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets [8.977790462534152]
We propose DePA, a novel line-level detection and cleansing method tailored to the structural properties of code. DePA significantly outperforms existing methods, achieving 0.14-0.19 improvement in detection F1-score and a 44-65% increase in poisoned segment localization precision.
arXiv Detail & Related papers (2025-02-27T16:30:00Z)
Efficient Approximate Degenerate Ordered Statistics Decoding for Quantum Codes via Reliable Subset Reduction [5.625796693054094]
We introduce the concept of approximate degenerate decoding and integrate it with ordered statistics decoding (OSD) We present an ADOSD algorithm that significantly improves OSD efficiency in the code capacity noise model.
arXiv Detail & Related papers (2024-12-30T17:45:08Z)
SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z)
Limitations of the decoding-to-LPN reduction via code smoothing [59.90381090395222]
Researchers have attempted to demonstrate the algorithmic hardness of the Learning Parity with Noise problem.<n>We characterize the efficiency of the reduction in terms of the parameters of the decoding and primitive problems.
arXiv Detail & Related papers (2024-08-07T12:54:43Z)
Gumbel-Softmax Discretization Constraint, Differentiable IDS Channel, and an IDS-Correcting Code for DNA Storage [1.4272256806865107]
We present an autoencoder-based method, THEA-code, aimed at efficiently generating IDS-correcting codes for complex IDS channels. A Gumbel-Softmax discretization constraint is proposed to discretize the features of the autoencoder. A simulated differentiable IDS channel is developed as a differentiable alternative for IDS operations.
arXiv Detail & Related papers (2024-07-10T06:52:56Z)
Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes. The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z)
Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z)
Error Correction in Dynamical Codes [0.0]
We ask what is the general framework for a quantum error correcting code that is defined by a sequence of measurements.<n>We develop an algorithm that tracks information about the error syndromes through the protocol and put that together to determine the distance of a dynamical code.
arXiv Detail & Related papers (2024-03-07T02:47:21Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework. Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z)
Deep Joint Source-Channel Coding with Iterative Source Error Correction [11.41076729592696]
We propose an iterative source error correction (ISEC) decoding scheme for deep-learning-based joint source-channel code (Deep J SCC) Given a noisyword received through the channel, we use a Deep J SCC encoder and decoder pair to update the code iteratively. The proposed scheme produces more reliable source reconstruction results compared to the baseline when the channel noise characteristics do not match the ones used during training.
arXiv Detail & Related papers (2023-02-17T22:50:58Z)
LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space [42.56147568941768]
We introduce Normalized Style Space and $mathcalSN$ Cosine Distance to measure disalignment of inversion methods. Our proposed SNCD is differentiable, it can be optimized in both encoder-based and optimization-based embedding methods to conduct a uniform solution.
arXiv Detail & Related papers (2022-09-26T14:55:21Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning [49.3231734733112]
We show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Product (TP) based Error-Correcting Codes (ECC) and a safety margin into a single coherent pipeline. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime.
arXiv Detail & Related papers (2021-08-31T18:21:20Z)
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning [76.5589486928387]
Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes. In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs. KO codes beat state-of-the-art Reed-Muller and Polar codes, under the low-complexity successive cancellation decoding.
arXiv Detail & Related papers (2021-08-29T21:08:30Z)
ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning [40.25456611849273]
This work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder.
arXiv Detail & Related papers (2020-02-14T03:32:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.