string2string: A Modern Python Library for String-to-String Algorithms
- URL: http://arxiv.org/abs/2304.14395v1
- Date: Thu, 27 Apr 2023 17:57:19 GMT
- Title: string2string: A Modern Python Library for String-to-String Algorithms
- Authors: Mirac Suzgun, Stuart M. Shieber, Dan Jurafsky
- Abstract summary: string2string is an open-source library that offers a comprehensive suite of efficient algorithms for string-to-string problems.
It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis.
It is implemented in Python, easily installable via pip, and accessible through a simple API.
- Score: 24.167017445129105
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce string2string, an open-source library that offers a
comprehensive suite of efficient algorithms for a broad range of
string-to-string problems. It includes traditional algorithmic solutions as
well as recent advanced neural approaches to tackle various problems in string
alignment, distance measurement, lexical and semantic search, and similarity
analysis -- along with several helpful visualization tools and metrics to
facilitate the interpretation and analysis of these methods. Notable algorithms
featured in the library include the Smith-Waterman algorithm for pairwise local
alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher
algorithm for edit distance, BARTScore and BERTScore for similarity analysis,
the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic
search. Besides, it wraps existing efficient and widely-used implementations of
certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is
appropriate and suitable. Overall, the library aims to provide extensive
coverage and increased flexibility in comparison to existing libraries for
strings. It can be used for many downstream applications, tasks, and problems
in natural-language processing, bioinformatics, and computational social
sciences. It is implemented in Python, easily installable via pip, and
accessible through a simple API. Source code, documentation, and tutorials are
all available on our GitHub page: https://github.com/stanfordnlp/string2string.
Related papers
- LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Beryllium: Neural Search for Algorithm Implementations [14.11934122454653]
We design a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract information from the algorithm descriptions.
We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation.
Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java.
arXiv Detail & Related papers (2023-05-25T03:49:36Z) - torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need.
It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z) - textless-lib: a Library for Textless Spoken Language Processing [50.070693765984075]
We introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area.
We describe the building blocks that the library provides and demonstrate its usability.
arXiv Detail & Related papers (2022-02-15T12:39:42Z) - Small-Text: Active Learning for Text Classification in Python [23.87081733039124]
small-text is an easy-to-use active learning library for Python.
It offers pool-based active learning for single- and multi-label text classification.
arXiv Detail & Related papers (2021-07-21T19:23:56Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z) - Evaluating Various Tokenizers for Arabic Text Classification [4.110108749051656]
We introduce three new tokenization algorithms for Arabic and compare them to three other baselines using unsupervised evaluations.
Our experiments show that the performance of such tokenization algorithms depends on the size of the dataset, type of the task, and the amount of morphology that exists in the dataset.
arXiv Detail & Related papers (2021-06-14T16:05:58Z) - PyGlove: Symbolic Programming for Automated Machine Learning [88.15565138144042]
We introduce a new way of programming AutoML based on symbolic programming.
Under this paradigm, ML programs are mutable, thus can be manipulated easily by another program.
We show that PyGlove users can easily convert a static program into a search space, quickly iterate on the search spaces and search algorithms, and craft complex search flows.
arXiv Detail & Related papers (2021-01-21T19:05:44Z) - Scout Algorithm For Fast Substring Matching [0.0]
Exact matching is a common task in many software applications.
We present a new algorithm, Scout, that is straightforward, quick and appropriate for all applications.
arXiv Detail & Related papers (2020-11-08T16:09:20Z) - Torch-Struct: Deep Structured Prediction Library [138.5262350501951]
We introduce Torch-Struct, a library for structured prediction.
Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API.
arXiv Detail & Related papers (2020-02-03T16:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.