Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder
- URL: http://arxiv.org/abs/2102.09206v1
- Date: Thu, 18 Feb 2021 08:08:17 GMT
- Title: Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder
- Authors: Shuqi Lu, Chenyan Xiong, Di He, Guolin Ke, Waleed Malik, Zhicheng Dou,
Paul Bennett, Tieyan Liu, Arnold Overwijk
- Abstract summary: Many real-world applications use Siamese networks to efficiently match text sequences at scale.
This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
- Score: 75.84152924972462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world applications use Siamese networks to efficiently match text
sequences at scale, which require high-quality sequence encodings. This paper
pre-trains language models dedicated to sequence matching in Siamese
architectures. We first hypothesize that a representation is better for
sequence matching if the entire sequence can be reconstructed from it, which,
however, is unlikely to be achieved in standard autoencoders: A strong decoder
can rely on its capacity and natural language patterns to reconstruct and
bypass the needs of better sequence encodings. Therefore we propose a new
self-learning method that pretrains the encoder with a weak decoder, which
reconstructs the original sequence from the encoder's [CLS] representations but
is restricted in both capacity and attention span. In our experiments on web
search and recommendation, the pre-trained SEED-Encoder, "SiamEsE oriented
encoder by reconstructing from weak decoder", shows significantly better
generalization ability when fine-tuned in Siamese networks, improving overall
accuracy and few-shot performances. Our code and models will be released.
Related papers
- Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes [59.55193427277134]
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels.
RM codes only admit limited sets of rates.
Efficient decoders are available for RM codes at finite lengths.
arXiv Detail & Related papers (2023-01-16T04:11:14Z) - Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers.
Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models.
We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words.
Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Recurrent autoencoder with sequence-aware encoding [0.0]
We propose an autoencoder architecture with sequence-aware encoding, which employs 1D convolutional layer to improve its performance.
We prove that the proposed solution dominates over the standard RAE, and the training process is order of magnitude faster.
arXiv Detail & Related papers (2020-09-15T20:51:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.