Symbolic Autoencoding for Self-Supervised Sequence Learning
- URL: http://arxiv.org/abs/2402.10575v1
- Date: Fri, 16 Feb 2024 11:04:31 GMT
- Title: Symbolic Autoencoding for Self-Supervised Sequence Learning
- Authors: Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin
Josifoski, Maxime Peyrard, Robert West
- Abstract summary: $Sigma$AE is a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data.
Our results demonstrate that $Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data.
- Score: 24.71036683224435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional language models, adept at next-token prediction in text
sequences, often struggle with transduction tasks between distinct symbolic
systems, particularly when parallel data is scarce. Addressing this issue, we
introduce \textit{symbolic autoencoding} ($\Sigma$AE), a self-supervised
framework that harnesses the power of abundant unparallel data alongside
limited parallel data. $\Sigma$AE connects two generative models via a discrete
bottleneck layer and is optimized end-to-end by minimizing reconstruction loss
(simultaneously with supervised loss for the parallel data), such that the
sequence generated by the discrete bottleneck can be read out as the transduced
input sequence. We also develop gradient-based methods allowing for efficient
self-supervised sequence learning despite the discreteness of the bottleneck.
Our results demonstrate that $\Sigma$AE significantly enhances performance on
transduction tasks, even with minimal parallel data, offering a promising
solution for weakly supervised learning scenarios.
Related papers
- Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner.
Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.