HyReC: Exploring Hybrid-based Retriever for Chinese
- URL: http://arxiv.org/abs/2506.21913v1
- Date: Fri, 27 Jun 2025 04:57:01 GMT
- Title: HyReC: Exploring Hybrid-based Retriever for Chinese
- Authors: Zunran Wang, Zheng Shenpeng, Wang Shenglan, Minghui Zhao, Zhonghua Li,
- Abstract summary: HyReC is an end-to-end optimization method tailored specifically for hybrid-based retrieval in Chinese.<n>It enhances performance by integrating the semantic union of terms into the representation model.<n>It features the Global-Local-Aware (GLAE) to promote consistent semantic sharing between lexicon-based and dense retrieval.
- Score: 4.044938393768822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hybrid-based retrieval methods, which unify dense-vector and lexicon-based retrieval, have garnered considerable attention in the industry due to performance enhancement. However, despite their promising results, the application of these hybrid paradigms in Chinese retrieval contexts has remained largely underexplored. In this paper, we introduce HyReC, an innovative end-to-end optimization method tailored specifically for hybrid-based retrieval in Chinese. HyReC enhances performance by integrating the semantic union of terms into the representation model. Additionally, it features the Global-Local-Aware Encoder (GLAE) to promote consistent semantic sharing between lexicon-based and dense retrieval while minimizing the interference between them. To further refine alignment, we incorporate a Normalization Module (NM) that fosters mutual benefits between the retrieval approaches. Finally, we evaluate HyReC on the C-MTEB retrieval benchmark to demonstrate its effectiveness.
Related papers
- Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model [71.45491434257106]
Unified Generative Recommendation Framework (UniGRF) is a novel approach that integrates retrieval and ranking into a single generative model.<n>To enhance inter-stage collaboration, UniGRF introduces a ranking-driven enhancer module.<n>UniGRF significantly outperforms existing models on benchmark datasets.
arXiv Detail & Related papers (2025-04-23T06:43:54Z) - Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.<n>Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.<n>We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z) - DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation [0.0]
DAT (Dynamic Alpha Tuning) is a novel hybrid retrieval framework that balances dense retrieval and BM25 for each query.<n>It consistently outperforms fixed-weighting hybrid retrieval methods across various evaluation metrics.<n>Even on smaller models, DAT delivers strong performance, highlighting its efficiency and adaptability.
arXiv Detail & Related papers (2025-03-29T08:35:01Z) - SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction [20.6787276745193]
We introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework.<n>We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval.
arXiv Detail & Related papers (2025-03-03T12:37:34Z) - EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration [60.47645731801866]
Large language models (LLMs) are increasingly leveraged as foundational backbones in advanced recommender systems.<n>LLMs are pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone.<n>We propose EAGER-LLM, a decoder-only generative recommendation framework that integrates endogenous and endogenous behavioral and semantic information in a non-intrusive manner.
arXiv Detail & Related papers (2025-02-20T17:01:57Z) - Deep Reinforcement Learning with Hybrid Intrinsic Reward Model [50.53705050673944]
Intrinsic reward shaping has emerged as a prevalent approach to solving hard-exploration and sparse-rewards environments.<n>We introduce HIRE (Hybrid Intrinsic REward), a framework for creating hybrid intrinsic rewards through deliberate fusion strategies.
arXiv Detail & Related papers (2025-01-22T04:22:13Z) - ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion [22.164620956284466]
Retrieval-based augmentations (RA) incorporating knowledge from an external database into language models have greatly succeeded in various knowledge-intensive (KI) tasks.
Existing works focus on concatenating retrievals with inputs to improve model performance.
This paper proposes a new paradigm of RA named textbfReFusion, a computation-efficient Retrieval representation Fusion with bi-level optimization.
arXiv Detail & Related papers (2024-01-04T07:39:26Z) - Learning to Rank in Generative Retrieval [62.91492903161522]
Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target.
We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR.
This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
arXiv Detail & Related papers (2023-06-27T05:48:14Z) - Hybrid and Collaborative Passage Reranking [144.83902343298112]
We propose a Hybrid and Collaborative Passage Reranking (HybRank) method.
It incorporates the lexical and semantic properties of sparse and dense retrievers for reranking.
Built on off-the-shelf retriever features, HybRank is a plug-in reranker capable of enhancing arbitrary passage lists.
arXiv Detail & Related papers (2023-05-16T09:38:52Z) - Zero-Shot Retrieval with Search Agents and Hybrid Environments [8.017306481455778]
Current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers.
We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder.
Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker.
arXiv Detail & Related papers (2022-09-30T13:50:25Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.