Recent advances in the Self-Referencing Embedding Strings (SELFIES)
library
- URL: http://arxiv.org/abs/2302.03620v1
- Date: Tue, 7 Feb 2023 17:24:08 GMT
- Title: Recent advances in the Self-Referencing Embedding Strings (SELFIES)
library
- Authors: Alston Lo, Robert Pollice, AkshatKumar Nigam, Andrew D. White, Mario
Krenn and Al\'an Aspuru-Guzik
- Abstract summary: String-based molecular representations play a crucial role in cheminformatics applications.
Traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models.
SELF-referencIng Embedded Strings (SELFIES) was proposed that is inherently 100% robust, alongside an accompanying open-source implementation.
- Score: 1.9573380763700712
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: String-based molecular representations play a crucial role in cheminformatics
applications, and with the growing success of deep learning in chemistry, have
been readily adopted into machine learning pipelines. However, traditional
string-based representations such as SMILES are often prone to syntactic and
semantic errors when produced by generative models. To address these problems,
a novel representation, SELF-referencIng Embedded Strings (SELFIES), was
proposed that is inherently 100% robust, alongside an accompanying open-source
implementation. Since then, we have generalized SELFIES to support a wider
range of molecules and semantic constraints and streamlined its underlying
grammar. We have implemented this updated representation in subsequent versions
of \selfieslib, where we have also made major advances with respect to design,
efficiency, and supported features. Hence, we present the current status of
\selfieslib (version 2.1.1) in this manuscript.
Related papers
- Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Efficient Guided Generation for Large Language Models [0.21485350418225244]
We show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine.
This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars.
arXiv Detail & Related papers (2023-07-19T01:14:49Z) - Improving Zero-Shot Generalization for CLIP with Synthesized Prompts [135.4317555866831]
Most existing methods require labeled data for all classes, which may not hold in real-world applications.
We propose a plug-and-play generative approach called textbfSynttextbfHestextbfIzed textbfPrompts(textbfSHIP) to improve existing fine-tuning methods.
arXiv Detail & Related papers (2023-07-14T15:15:45Z) - Automatic Context Pattern Generation for Entity Set Expansion [40.535332689515656]
We develop a module that automatically generates high-quality context patterns for entities.
We also propose the GAPA framework that leverages the aforementioned GenerAted PAtterns to expand target entities.
arXiv Detail & Related papers (2022-07-17T06:50:35Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - On Adversarial Robustness of Synthetic Code Generation [1.2559148369195197]
This paper showcases the existence of significant dataset bias through different classes of adversarial examples.
We propose several dataset augmentation techniques to reduce bias and showcase their efficacy.
arXiv Detail & Related papers (2021-06-22T09:37:48Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.