Fugu-MT 論文翻訳(概要): Domain Adaptation for Memory-Efficient Dense Retrieval

論文の概要: Domain Adaptation for Memory-Efficient Dense Retrieval

arxiv url: http://arxiv.org/abs/2205.11498v1
Date: Mon, 23 May 2022 17:53:44 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-24 17:00:53.657175
Title: Domain Adaptation for Memory-Efficient Dense Retrieval
Title（参考訳）: メモリ効率の良いDense Retrievalのためのドメイン適応
Authors: Nandan Thakur, Nils Reimers, Jimmy Lin
Abstract要約: BPRやJPQのようなバイナリ埋め込みモデルでは、ドメインシフトが関与すれば、ベースラインよりも大幅にパフォーマンスが低下することを示す。本稿では,BPRとJPQのトレーニング手順を改良し,コーパス特異的な生成手順と組み合わせることを提案する。
参考スコア（独自算出の注目度）: 49.98615945702959
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Dense retrievers encode documents into fixed dimensional embeddings. However, storing all the document embeddings within an index produces bulky indexes which are expensive to serve. Recently, BPR (Yamada et al., 2021) and JPQ (Zhan et al., 2021a) have been proposed which train the model to produce binary document vectors, which reduce the index 32x and more. The authors showed these binary embedding models significantly outperform more traditional index compression techniques like Product Quantization (PQ). Previous work evaluated these approaches just in-domain, i.e. the methods were evaluated on tasks for which training data is available. In practice, retrieval models are often used in an out-of-domain setting, where they have been trained on a publicly available dataset, like MS MARCO, but are then used for some custom dataset for which no training data is available. In this work, we show that binary embedding models like BPR and JPQ can perform significantly worse than baselines once there is a domain-shift involved. We propose a modification to the training procedure of BPR and JPQ and combine it with a corpus specific generative procedure which allow the adaptation of BPR and JPQ to any corpus without requiring labeled training data. Our domain-adapted strategy known as GPL is model agnostic, achieves an improvement by up-to 19.3 and 11.6 points in nDCG@10 across the BEIR benchmark in comparison to BPR and JPQ while maintaining its 32x memory efficiency. JPQ+GPL even outperforms our upper baseline: uncompressed TAS-B model on average by 2.0 points.
Abstract（参考訳）: デンスレトリバーは文書を固定次元埋め込みにエンコードする。しかし、すべてのドキュメントの埋め込みをインデックスに格納すると、高額なインデックスが生成される。近年,BPR ( Yamada et al., 2021) とJPQ (Zhan et al., 2021a) が提案されている。著者らは、これらのバイナリ埋め込みモデルは、製品量子化(pq)のような従来のインデックス圧縮技術を大きく上回っていることを示した。前回の研究では、これらのアプローチをドメイン内のみ、すなわちトレーニングデータを利用できるタスクで評価した。実際には、検索モデルはドメイン外設定でよく使用され、MS MARCOのような公開データセットでトレーニングされた後、トレーニングデータが使用できないカスタムデータセットで使用される。本稿では,bpr や jpq のようなバイナリ組込みモデルが,ドメインシフトが関与すれば,ベースラインよりも著しくパフォーマンスが低下することを示す。本稿では,BPR と JPQ のトレーニング手順の修正を提案し,BPR と JPQ の任意のコーパスへの適応を可能にするコーパス固有の生成手順と組み合わせる。 GPLとして知られるドメイン適応型戦略はモデル非依存であり、32倍のメモリ効率を維持しながら、BPRやJPQと比較して、BEIRベンチマーク全体において、nDCG@10の19.3および11.6ポイントの改善を実現している。 jpq+gplは、平均2.0ポイントの非圧縮tas-bモデルよりも優れています。

論文の概要: Domain Adaptation for Memory-Efficient Dense Retrieval

関連論文リスト