Fugu-MT 論文翻訳(概要): LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

論文の概要: LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

arxiv url: http://arxiv.org/abs/2508.17858v1
Date: Mon, 25 Aug 2025 10:07:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.722917
Title: LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation
Title（参考訳）: LexSemBridge:Token-Aware Embedding Augmentationによる細粒度線量表現の強化
Authors: Shaoxiong Zhan, Hai Lin, Hongming Tan, Xiaodong Cai, Hai-Tao Zheng, Xin Su, Zifei Shan, Ruitong Liu, Hong-Gee Kim,
Abstract要約: 本稿では,細粒度で入力対応のベクトル変調により高密度なクエリ表現を向上する統合フレームワークを提案する。 LexSemBridgeはバックボーンエンコーダを変更することなくプラグインとして動作する。
参考スコア（独自算出の注目度）: 16.162310785810792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As queries in retrieval-augmented generation (RAG) pipelines powered by large language models (LLMs) become increasingly complex and diverse, dense retrieval models have demonstrated strong performance in semantic matching. Nevertheless, they often struggle with fine-grained retrieval tasks, where precise keyword alignment and span-level localization are required, even in cases with high lexical overlap that would intuitively suggest easier retrieval. To systematically evaluate this limitation, we introduce two targeted tasks, keyword retrieval and part-of-passage retrieval, designed to simulate practical fine-grained scenarios. Motivated by these observations, we propose LexSemBridge, a unified framework that enhances dense query representations through fine-grained, input-aware vector modulation. LexSemBridge constructs latent enhancement vectors from input tokens using three paradigms: Statistical (SLR), Learned (LLR), and Contextual (CLR), and integrates them with dense embeddings via element-wise interaction. Theoretically, we show that this modulation preserves the semantic direction while selectively amplifying discriminative dimensions. LexSemBridge operates as a plug-in without modifying the backbone encoder and naturally extends to both text and vision modalities. Extensive experiments across semantic and fine-grained retrieval tasks validate the effectiveness and generality of our approach. All code and models are publicly available at https://github.com/Jasaxion/LexSemBridge/
Abstract（参考訳）: 大規模言語モデル(LLM)をベースとした検索拡張生成(RAG)パイプラインのクエリが複雑化し,多種多様になるにつれて,セマンティックマッチングにおいて高い性能を示すようになった。それにもかかわらず、彼らはしばしば、正確なキーワードアライメントとスパンレベルのローカライゼーションを必要とする、きめ細かな検索タスクに苦労する。この制限をシステマティックに評価するために,本研究では,現実的なきめ細かいシナリオをシミュレートするために設計された,キーワード検索とパート・オブ・パッセージ検索という2つのタスクを導入する。これらの観測によって動機付けられたLexSemBridgeは、細粒度で入力対応のベクトル変調により高密度なクエリ表現を強化する統一的なフレームワークである。 LexSemBridgeは、統計的(SLR)、学習的(LLR)、コンテキスト的(CLR)の3つのパラダイムを用いて、入力トークンから潜在エンハンスメントベクトルを構築し、それらを要素的相互作用を通じて密着型埋め込みと統合する。理論的には、この変調は、識別的次元を選択的に増幅しながら意味的な方向を保っていることを示す。 LexSemBridgeはバックボーンエンコーダを変更することなくプラグインとして動作する。セマンティックおよびきめ細かな検索タスクにわたる広範囲な実験により、我々のアプローチの有効性と一般化が検証された。すべてのコードとモデルはhttps://github.com/Jasaxion/LexSemBridge/で公開されている。

論文の概要: LexSemBridge: Fine-Grained Dense Representation Enhancement through Token-Aware Embedding Augmentation

関連論文リスト