Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
Pairwise Sentence Scoring Tasks
- URL: http://arxiv.org/abs/2010.08240v2
- Date: Mon, 12 Apr 2021 10:02:14 GMT
- Title: Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
Pairwise Sentence Scoring Tasks
- Authors: Nandan Thakur, Nils Reimers, Johannes Daxenberger, Iryna Gurevych
- Abstract summary: We present a simple yet efficient data augmentation strategy called Augmented SBERT.
We use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder.
We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
- Score: 59.13635174016506
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: There are two approaches for pairwise sentence scoring: Cross-encoders, which
perform full-attention over the input pair, and Bi-encoders, which map each
input independently to a dense vector space. While cross-encoders often achieve
higher performance, they are too slow for many practical use cases.
Bi-encoders, on the other hand, require substantial training data and
fine-tuning over the target task to achieve competitive performance. We present
a simple yet efficient data augmentation strategy called Augmented SBERT, where
we use the cross-encoder to label a larger set of input pairs to augment the
training data for the bi-encoder. We show that, in this process, selecting the
sentence pairs is non-trivial and crucial for the success of the method. We
evaluate our approach on multiple tasks (in-domain) as well as on a domain
adaptation task. Augmented SBERT achieves an improvement of up to 6 points for
in-domain and of up to 37 points for domain adaptation tasks compared to the
original bi-encoder performance.
Related papers
- Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation [59.18151483767509]
We introduce a dual-path token lifting for domain shift correction in test time adaptation.
We then perform dual-path lifting with interleaved token prediction and update between the path of domain shift tokens and the path of class tokens.
Experimental results on the benchmark datasets demonstrate that our proposed method significantly improves the online fully test-time domain adaptation performance.
arXiv Detail & Related papers (2024-08-26T02:33:47Z) - LAIT: Efficient Multi-Segment Encoding in Transformers with
Layer-Adjustable Interaction [31.895986544484206]
We introduce Layer- Interactions in Transformers (LAIT)
Within LAIT, segmented inputs are first encoded independently, and then jointly.
We find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy.
arXiv Detail & Related papers (2023-05-31T06:09:59Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Rate Coding or Direct Coding: Which One is Better for Accurate, Robust,
and Energy-efficient Spiking Neural Networks? [4.872468969809081]
Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes.
Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system.
We conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency.
arXiv Detail & Related papers (2022-01-31T16:18:07Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.