Binary Embedding-based Retrieval at Tencent
- URL: http://arxiv.org/abs/2302.08714v1
- Date: Fri, 17 Feb 2023 06:10:02 GMT
- Title: Binary Embedding-based Retrieval at Tencent
- Authors: Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan
Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
- Abstract summary: Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications.
We propose a binary embedding-based retrieval engine equipped with a recurrent binarization algorithm that enables customized bits per dimension.
We successfully employed the introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World, etc.
- Score: 30.44247353560061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale embedding-based retrieval (EBR) is the cornerstone of
search-related industrial applications. Given a user query, the system of EBR
aims to identify relevant information from a large corpus of documents that may
be tens or hundreds of billions in size. The storage and computation turn out
to be expensive and inefficient with massive documents and high concurrent
queries, making it difficult to further scale up. To tackle the challenge, we
propose a binary embedding-based retrieval (BEBR) engine equipped with a
recurrent binarization algorithm that enables customized bits per dimension.
Specifically, we compress the full-precision query and document embeddings,
formulated as float vectors in general, into a composition of multiple binary
vectors using a lightweight transformation model with residual multilayer
perception (MLP) blocks. We can therefore tailor the number of bits for
different applications to trade off accuracy loss and cost savings.
Importantly, we enable task-agnostic efficient training of the binarization
model using a new embedding-to-embedding strategy. We also exploit the
compatible training of binary embeddings so that the BEBR engine can support
indexing among multiple embedding versions within a unified system. To further
realize efficient search, we propose Symmetric Distance Calculation (SDC) to
achieve lower response time than Hamming codes. We successfully employed the
introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World,
etc. The binarization algorithm can be seamlessly generalized to various tasks
with multiple modalities. Extensive experiments on offline benchmarks and
online A/B tests demonstrate the efficiency and effectiveness of our method,
significantly saving 30%~50% index costs with almost no loss of accuracy at the
system level.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Mixed-Precision Embeddings for Large-Scale Recommendation Models [19.93156309493436]
Mixed-Precision Embeddings (MPE) is a novel embedding compression method.
MPE achieves about 200x compression on the Criteo dataset without comprising the prediction accuracy.
arXiv Detail & Related papers (2024-09-30T14:04:27Z) - CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity
Detection [23.8834126695488]
Binary code similarity detection (BCSD) is a fundamental technique for various application.
We propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and comparison-based approaches.
arXiv Detail & Related papers (2024-02-29T03:02:07Z) - Neural Network Compression using Binarization and Few Full-Precision
Weights [7.206962876422061]
Automatic Prune Binarization (APB) is a novel compression technique combining quantization with pruning.
APB enhances the representational capability of binary networks using a few full-precision weights.
APB delivers better accuracy/memory trade-off compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-15T08:52:00Z) - Query Encoder Distillation via Embedding Alignment is a Strong Baseline
Method to Boost Dense Retriever Online Efficiency [4.254906060165999]
We show that even a 2-layer, BERT-based query encoder can still retain 92.5% of the full DE performance on the BEIR benchmark.
We hope that our findings will encourage the community to re-evaluate the trade-offs between method complexity and performance improvements.
arXiv Detail & Related papers (2023-06-05T06:53:55Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - BiBench: Benchmarking and Analyzing Network Binarization [72.59760752906757]
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings.
Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood.
We present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
arXiv Detail & Related papers (2023-01-26T17:17:16Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Binary DAD-Net: Binarized Driveable Area Detection Network for
Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net)
It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part.
It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.