Related papers: Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

URL: http://arxiv.org/abs/2102.09206v1
Date: Thu, 18 Feb 2021 08:08:17 GMT
Title: Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder
Authors: Shuqi Lu, Chenyan Xiong, Di He, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tieyan Liu, Arnold Overwijk
Abstract summary: Many real-world applications use Siamese networks to efficiently match text sequences at scale. This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
Score: 75.84152924972462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many real-world applications use Siamese networks to efficiently match text sequences at scale, which require high-quality sequence encodings. This paper pre-trains language models dedicated to sequence matching in Siamese architectures. We first hypothesize that a representation is better for sequence matching if the entire sequence can be reconstructed from it, which, however, is unlikely to be achieved in standard autoencoders: A strong decoder can rely on its capacity and natural language patterns to reconstruct and bypass the needs of better sequence encodings. Therefore we propose a new self-learning method that pretrains the encoder with a weak decoder, which reconstructs the original sequence from the encoder's [CLS] representations but is restricted in both capacity and attention span. In our experiments on web search and recommendation, the pre-trained SEED-Encoder, "SiamEsE oriented encoder by reconstructing from weak decoder", shows significantly better generalization ability when fine-tuned in Siamese networks, improving overall accuracy and few-shot performances. Our code and models will be released.

Related papers

Should we pre-train a decoder in contrastive learning for dense prediction tasks? [0.7237068561453082]
We propose a framework-agnostic adaptation to convert an encoder-only self-supervised learning (SSL) contrastive approach to an efficient encoder-decoder framework. We first update the existing architecture to accommodate a decoder and its respective contrastive loss. We then introduce a weighted encoder-decoder contrastive loss with non-competing objectives that facilitates the joint encoder-decoder architecture pre-training.
arXiv Detail & Related papers (2025-03-21T20:19:13Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes [59.55193427277134]
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels. RM codes only admit limited sets of rates. Efficient decoders are available for RM codes at finite lengths.
arXiv Detail & Related papers (2023-01-16T04:11:14Z)
Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers. Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z)
UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language. We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z)
Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models. We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words. Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
Recurrent autoencoder with sequence-aware encoding [0.0]
We propose an autoencoder architecture with sequence-aware encoding, which employs 1D convolutional layer to improve its performance. We prove that the proposed solution dominates over the standard RAE, and the training process is order of magnitude faster.
arXiv Detail & Related papers (2020-09-15T20:51:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.