Contrastive String Representation Learning using Synthetic Data
- URL: http://arxiv.org/abs/2110.04217v1
- Date: Fri, 8 Oct 2021 16:06:54 GMT
- Title: Contrastive String Representation Learning using Synthetic Data
- Authors: Urchade Zaratiana
- Abstract summary: The goal of String representation Learning (SRL) is to learn dense and low-dimensional vectors for encoding character sequences.
We propose a new method for to train a SRL model by only using synthetic data.
We demonstrate the effectiveness of our approach by evaluating the learned representation on the task of string similarity matching.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: String representation Learning (SRL) is an important task in the field of
Natural Language Processing, but it remains under-explored. The goal of SRL is
to learn dense and low-dimensional vectors (or embeddings) for encoding
character sequences. The learned representation from this task can be used in
many downstream application tasks such as string similarity matching or lexical
normalization. In this paper, we propose a new method for to train a SRL model
by only using synthetic data. Our approach makes use of Contrastive Learning in
order to maximize similarity between related strings while minimizing it for
unrelated strings. We demonstrate the effectiveness of our approach by
evaluating the learned representation on the task of string similarity
matching. Codes, data and pretrained models will be made publicly available.
Related papers
- SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models [11.78036105494679]
This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs)
We present the first-ever code search method that encodes dynamic information during training without the need to execute either the corpus under search or the search query at inference time.
arXiv Detail & Related papers (2023-05-05T20:46:56Z) - KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
with Reinforced Keywords Learning [25.421649004269373]
In task-oriented dialogs (TOD), reinforcement learning algorithms train a model to directly optimize response for task-related metrics.
We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting.
Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-30T06:27:46Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.