Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach
- URL: http://arxiv.org/abs/2512.16927v1
- Date: Sat, 29 Nov 2025 16:05:13 GMT
- Title: Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach
- Authors: Xinyu Guan, Shaohua Zhang,
- Abstract summary: This study focuses on optimizing Suffix Trees through methods like Splitting and Ukkonen's Algorithm.<n>A novel optimization combining Ukkonen's Algorithm with a new search technique is introduced, showing linear time and space efficiencies.<n> Empirical tests confirm the theoretical advantages, highlighting the optimized Suffix Tree's effectiveness in tasks like pattern recognition in genomic sequences.
- Score: 7.975242816297842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of computer science, the efficiency of text-search algorithms is crucial for processing vast amounts of data in areas such as natural language processing and bioinformatics. Traditional methods like Naive Search, KMP, and Boyer-Moore, while foundational, often fall short in handling the complexities and scale of modern datasets, such as the Reuters corpus and human genomic sequences. This study rigorously investigates text-search algorithms, focusing on optimizing Suffix Trees through methods like Splitting and Ukkonen's Algorithm, analyzed on datasets including the Reuters corpus and human genomes. A novel optimization combining Ukkonen's Algorithm with a new search technique is introduced, showing linear time and space efficiencies, outperforming traditional methods like Naive Search, KMP, and Boyer-Moore. Empirical tests confirm the theoretical advantages, highlighting the optimized Suffix Tree's effectiveness in tasks like pattern recognition in genomic sequences, achieving 100% accuracy. This research not only advances academic knowledge in text-search algorithms but also demonstrates significant practical utility in fields like natural language processing and bioinformatics, due to its superior resource efficiency and reliability.
Related papers
- Evolutionary Algorithms Approach For Search Based On Semantic Document Similarity [0.0]
We develop clustering, recommendation, and question-and-answering systems using various text representation techniques.<n>We show that Universal Sentence vectors (USE) is used to capture the semantic similarity of text.<n>And the transfer learning technique is used to apply Genetic Algorithm (GA) and Differential Evolution (DE) algorithms to search and retrieve relevant top N documents.
arXiv Detail & Related papers (2025-02-20T18:56:52Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference.
We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning with Differentiable Algorithms [6.47243430672461]
This thesis explores combining classic algorithms and machine learning systems like neural networks.
The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm.
In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable sorting gates, and differentiable logic gate networks.
arXiv Detail & Related papers (2022-09-01T17:30:00Z) - Training Neural Networks using SAT solvers [1.0152838128195465]
We propose an algorithm to explore the global optimisation method, using SAT solvers, for training a neural net.
In the experiments, we demonstrate the effectiveness of our algorithm against the ADAM optimiser in certain tasks like parity learning.
arXiv Detail & Related papers (2022-06-10T01:31:12Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Enhancing accuracy of deep learning algorithms by training with
low-discrepancy sequences [15.2292571922932]
We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set.
We demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms for problems in moderately high dimensions.
arXiv Detail & Related papers (2020-05-26T08:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.