Binary Embedding-based Retrieval at Tencent
- URL: http://arxiv.org/abs/2302.08714v1
- Date: Fri, 17 Feb 2023 06:10:02 GMT
- Title: Binary Embedding-based Retrieval at Tencent
- Authors: Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan
Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
- Abstract summary: Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications.
We propose a binary embedding-based retrieval engine equipped with a recurrent binarization algorithm that enables customized bits per dimension.
We successfully employed the introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World, etc.
- Score: 30.44247353560061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale embedding-based retrieval (EBR) is the cornerstone of
search-related industrial applications. Given a user query, the system of EBR
aims to identify relevant information from a large corpus of documents that may
be tens or hundreds of billions in size. The storage and computation turn out
to be expensive and inefficient with massive documents and high concurrent
queries, making it difficult to further scale up. To tackle the challenge, we
propose a binary embedding-based retrieval (BEBR) engine equipped with a
recurrent binarization algorithm that enables customized bits per dimension.
Specifically, we compress the full-precision query and document embeddings,
formulated as float vectors in general, into a composition of multiple binary
vectors using a lightweight transformation model with residual multilayer
perception (MLP) blocks. We can therefore tailor the number of bits for
different applications to trade off accuracy loss and cost savings.
Importantly, we enable task-agnostic efficient training of the binarization
model using a new embedding-to-embedding strategy. We also exploit the
compatible training of binary embeddings so that the BEBR engine can support
indexing among multiple embedding versions within a unified system. To further
realize efficient search, we propose Symmetric Distance Calculation (SDC) to
achieve lower response time than Hamming codes. We successfully employed the
introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World,
etc. The binarization algorithm can be seamlessly generalized to various tasks
with multiple modalities. Extensive experiments on offline benchmarks and
online A/B tests demonstrate the efficiency and effectiveness of our method,
significantly saving 30%~50% index costs with almost no loss of accuracy at the
system level.
Related papers
- Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity
Detection [23.8834126695488]
Binary code similarity detection (BCSD) is a fundamental technique for various application.
We propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and comparison-based approaches.
arXiv Detail & Related papers (2024-02-29T03:02:07Z) - Neural Network Compression using Binarization and Few Full-Precision
Weights [7.206962876422061]
Automatic Prune Binarization (APB) is a novel compression technique combining quantization with pruning.
APB enhances the representational capability of binary networks using a few full-precision weights.
APB delivers better accuracy/memory trade-off compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-15T08:52:00Z) - Query Encoder Distillation via Embedding Alignment is a Strong Baseline
Method to Boost Dense Retriever Online Efficiency [4.254906060165999]
We show that even a 2-layer, BERT-based query encoder can still retain 92.5% of the full DE performance on the BEIR benchmark.
We hope that our findings will encourage the community to re-evaluate the trade-offs between method complexity and performance improvements.
arXiv Detail & Related papers (2023-06-05T06:53:55Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - BiBench: Benchmarking and Analyzing Network Binarization [72.59760752906757]
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings.
Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood.
We present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
arXiv Detail & Related papers (2023-01-26T17:17:16Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization [5.799838997511804]
Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities.
We propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval.
Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision.
arXiv Detail & Related papers (2022-02-15T22:00:04Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Binary DAD-Net: Binarized Driveable Area Detection Network for
Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net)
It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part.
It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.