Fugu-MT 論文翻訳(概要): Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

論文の概要: Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

arxiv url: http://arxiv.org/abs/2405.12801v1
Date: Tue, 21 May 2024 13:51:48 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-22 13:00:17.744931
Title: Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval
Title（参考訳）: 隣人の比較が簡単になる: 効率よく効果的な検索のための複数の候補を共同比較する
Authors: Jonghyun Song, Cheyon Jin, Wenlong Zhao, Jay-Yoon Lee,
Abstract要約: 一般的な検索と参照のパラダイムでは、スケーラブルなバイエンコーダを使用して、関連する候補の広いセットを検索し、その後、高価ながより正確なクロスエンコーダを限定的な候補セットに取得する。本稿では,クエリと複数の候補の埋め込みを,浅い自己認識層を通じて協調的に比較する比較多重候補フレームワークを提案する。コンテキスト化された表現を提供する一方で、CMCは複数の比較を同時に扱うのに十分なスケーラビリティを備えており、2K候補の比較には100の比較の2倍の時間しかかからない。
参考スコア（独自算出の注目度）: 4.547480408065687
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: A common retrieve-and-rerank paradigm involves retrieving a broad set of relevant candidates using a scalable bi-encoder, followed by expensive but more accurate cross-encoders to a limited candidate set. However, this small subset often leads to error propagation from the bi-encoders, thereby restricting the performance of the overall pipeline. To address these issues, we propose the Comparing Multiple Candidates (CMC) framework, which compares a query and multiple candidate embeddings jointly through shallow self-attention layers. While providing contextualized representations, CMC is scalable enough to handle multiple comparisons simultaneously, where comparing 2K candidates takes only twice as long as comparing 100. Practitioners can use CMC as a lightweight and effective reranker to improve top-1 accuracy. Moreover, when integrated with another retriever, CMC reranking can function as a virtually enhanced retriever. This configuration adds only negligible latency compared to using a single retriever (virtual), while significantly improving recall at K (enhanced).} Through experiments, we demonstrate that CMC, as a virtually enhanced retriever, significantly improves Recall@k (+6.7, +3.5%-p for R@16, R@64) compared to the initial retrieval stage on the ZeSHEL dataset. Meanwhile, we conduct experiments for direct reranking on entity, passage, and dialogue ranking. The results indicate that CMC is not only faster (11x) than cross-encoders but also often more effective, with improved prediction performance in Wikipedia entity linking (+0.7%-p) and DSTC7 dialogue ranking (+3.3%-p). The code and link to datasets are available at https://github.com/yc-song/cmc
Abstract（参考訳）: 一般的な検索と参照のパラダイムでは、スケーラブルなバイエンコーダを使用して、関連する候補の広いセットを検索し、その後、高価ながより正確なクロスエンコーダを限定的な候補セットに取得する。しかし、この小さなサブセットは、しばしばバイエンコーダからのエラーの伝播を引き起こすため、パイプライン全体のパフォーマンスが制限される。これらの問題に対処するために,クエリと複数の候補埋め込みを浅い自己認識層を通じて共同で比較するCMC(Comparing Multiple Candidates)フレームワークを提案する。コンテキスト化された表現を提供する一方で、CMCは複数の比較を同時に扱うのに十分なスケーラビリティを備えており、2K候補の比較には100の比較の2倍の時間しかかからない。 CMCを軽量で効果的なリランカーとして使用することで、トップ1の精度を向上させることができる。さらに、他のレトリバーと統合した場合、CMCリグレードは事実上強化されたレトリバーとして機能する。この構成は、単一のレトリバー(仮想)と比較して無視可能なレイテンシのみを追加し、Kでのリコールを大幅に改善する(拡張)。実験により,CMCはZeSHELデータセットの初期検索ステージと比較して,Recall@k(+6.7,+3.5%-p, R@16, R@64)を大幅に改善した。一方,本研究では,エンティティ,パス,ダイアログのランク付けを直接行う実験を行っている。その結果、CMCはクロスエンコーダよりも11倍高速であるだけでなく、ウィキペディアのエンティティリンク(+0.7%-p)とDSTC7ダイアログランキング(+3.3%-p)の予測性能を改善した。コードとデータセットへのリンクはhttps://github.com/yc-song/cmcで確認できる。

論文の概要: Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

関連論文リスト