Related papers: DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage

DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage

URL: http://arxiv.org/abs/2312.12717v1
Date: Wed, 20 Dec 2023 02:22:54 GMT
Title: DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage
Authors: Alan J.X. Guo, Sihan Sun, Xiang Wei, Mengyi Wei, Xin Chen
Abstract summary: DoDo-Code is an IDS-correcting code that incorporates deep embedding of Levenshtein distance, deep embedding-based codeword search, and deep embedding-based segment correcting. DoDo-Code is the first IDS-correcting code designed using plausible deep learning methodologies.
Score: 5.974644083956271
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, DNA storage has emerged as a promising data storage solution, offering significant advantages in storage density, maintenance cost efficiency, and parallel replication capability. Mathematically, the DNA storage pipeline can be viewed as an insertion, deletion, and substitution (IDS) channel. Because of the mathematical terra incognita of the Levenshtein distance, designing an IDS-correcting code is still a challenge. In this paper, we propose an innovative approach that utilizes deep Levenshtein distance embedding to bypass these mathematical challenges. By representing the Levenshtein distance between two sequences as a conventional distance between their corresponding embedding vectors, the inherent structural property of Levenshtein distance is revealed in the friendly embedding space. Leveraging this embedding space, we introduce the DoDo-Code, an IDS-correcting code that incorporates deep embedding of Levenshtein distance, deep embedding-based codeword search, and deep embedding-based segment correcting. To address the requirements of DNA storage, we also present a preliminary algorithm for long sequence decoding. As far as we know, the DoDo-Code is the first IDS-correcting code designed using plausible deep learning methodologies, potentially paving the way for a new direction in error-correcting code research. It is also the first IDS code that exhibits characteristics of being `optimal' in terms of redundancy, significantly outperforming the mainstream IDS-correcting codes of the Varshamov-Tenengolts code family in code rate.

Related papers

Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets [8.977790462534152]
We propose DePA, a novel line-level detection and cleansing method tailored to the structural properties of code. DePA significantly outperforms existing methods, achieving 0.14-0.19 improvement in detection F1-score and a 44-65% increase in poisoned segment localization precision.
arXiv Detail & Related papers (2025-02-27T16:30:00Z)
Efficient Approximate Degenerate Ordered Statistics Decoding for Quantum Codes via Reliable Subset Reduction [5.625796693054094]
We introduce the concept of approximate degenerate decoding and integrate it with ordered statistics decoding (OSD) We present an ADOSD algorithm that significantly improves OSD efficiency in the code capacity noise model.
arXiv Detail & Related papers (2024-12-30T17:45:08Z)
SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z)
Gumbel-Softmax Discretization Constraint, Differentiable IDS Channel, and an IDS-Correcting Code for DNA Storage [1.4272256806865107]
We present an autoencoder-based method, THEA-code, aimed at efficiently generating IDS-correcting codes for complex IDS channels. A Gumbel-Softmax discretization constraint is proposed to discretize the features of the autoencoder. A simulated differentiable IDS channel is developed as a differentiable alternative for IDS operations.
arXiv Detail & Related papers (2024-07-10T06:52:56Z)
Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes. The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z)
Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Sparse-Inductive Generative Adversarial Hashing for Nearest Neighbor Search [8.020530603813416]
We propose a novel unsupervised hashing method, termed Sparsity-Induced Generative Adversarial Hashing (SiGAH) SiGAH encodes large-scale high-scale high-dimensional features into binary codes, which solves the two problems through a generative adversarial training framework. Experimental results on four benchmarks, i.e. Tiny100K, GIST1M, Deep1M, and MNIST, have shown that the proposed SiGAH has superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-12T08:07:23Z)
LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space [42.56147568941768]
We introduce Normalized Style Space and $mathcalSN$ Cosine Distance to measure disalignment of inversion methods. Our proposed SNCD is differentiable, it can be optimized in both encoder-based and optimization-based embedding methods to conduct a uniform solution.
arXiv Detail & Related papers (2022-09-26T14:55:21Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning [49.3231734733112]
We show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Product (TP) based Error-Correcting Codes (ECC) and a safety margin into a single coherent pipeline. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime.
arXiv Detail & Related papers (2021-08-31T18:21:20Z)
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning [76.5589486928387]
Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes. In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs. KO codes beat state-of-the-art Reed-Muller and Polar codes, under the low-complexity successive cancellation decoding.
arXiv Detail & Related papers (2021-08-29T21:08:30Z)
ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning [40.25456611849273]
This work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder.
arXiv Detail & Related papers (2020-02-14T03:32:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.