Fugu-MT 論文翻訳(概要): ComLQ: Benchmarking Complex Logical Queries in Information Retrieval

論文の概要: ComLQ: Benchmarking Complex Logical Queries in Information Retrieval

arxiv url: http://arxiv.org/abs/2511.12004v2
Date: Sun, 23 Nov 2025 06:31:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 16:30:37.436507
Title: ComLQ: Benchmarking Complex Logical Queries in Information Retrieval
Title（参考訳）: ComLQ:情報検索における複雑な論理的クエリのベンチマーク
Authors: Ganlin Xu, Zhitao Yin, Linghao Zhang, Jiaqing Liang, Weijia Lu, Xiaodong Zhang, Zhifei Yang, Sihang Jiang, Deqing Yang,
Abstract要約: 情報検索システムは,様々なアプリケーションにまたがる情報過負荷をナビゲートする上で重要な役割を担っている。これらのベンチマークは、現実世界のシナリオにおける複雑なクエリ上でのIRモデルの性能を十分に評価するためには使用できない。大規模言語モデル(LLM)を利用してtextbfComplex textbfLogical textbfQueries 用の新しいIRデータセット textbfComLQ を構築する手法を提案する。
参考スコア（独自算出の注目度）: 26.606215927237248
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Information retrieval (IR) systems play a critical role in navigating information overload across various applications. Existing IR benchmarks primarily focus on simple queries that are semantically analogous to single- and multi-hop relations, overlooking \emph{complex logical queries} involving first-order logic operations such as conjunction ($\land$), disjunction ($\lor$), and negation ($\lnot$). Thus, these benchmarks can not be used to sufficiently evaluate the performance of IR models on complex queries in real-world scenarios. To address this problem, we propose a novel method leveraging large language models (LLMs) to construct a new IR dataset \textbf{ComLQ} for \textbf{Com}plex \textbf{L}ogical \textbf{Q}ueries, which comprises 2,909 queries and 11,251 candidate passages. A key challenge in constructing the dataset lies in capturing the underlying logical structures within unstructured text. Therefore, by designing the subgraph-guided prompt with the subgraph indicator, an LLM (such as GPT-4o) is guided to generate queries with specific logical structures based on selected passages. All query-passage pairs in ComLQ are ensured \emph{structure conformity} and \emph{evidence distribution} through expert annotation. To better evaluate whether retrievers can handle queries with negation, we further propose a new evaluation metric, \textbf{Log-Scaled Negation Consistency} (\textbf{LSNC@$K$}). As a supplement to standard relevance-based metrics (such as nDCG and mAP), LSNC@$K$ measures whether top-$K$ retrieved passages violate negation conditions in queries. Our experimental results under zero-shot settings demonstrate existing retrieval models' limited performance on complex logical queries, especially on queries with negation, exposing their inferior capabilities of modeling exclusion.
Abstract（参考訳）: 情報検索(IR)システムは、様々なアプリケーション間で情報の過負荷をナビゲートする上で重要な役割を担っている。既存のIRベンチマークでは、接続($\land$)、解離($\lor$)、否定($\lnot$)といった一階述語論理演算を含む 'emph{complex logic query} を見渡すことで、シングルホップとマルチホップの関係にセマンティックに類似した単純なクエリに重点を置いている。したがって、これらのベンチマークは、現実世界のシナリオにおける複雑なクエリ上でのIRモデルの性能を十分に評価するためには使用できない。この問題に対処するために,2,909のクエリと11,251の候補パスからなる,新しいIRデータセットである \textbf{ComLQ} for \textbf{Com}plex \textbf{L}ogical \textbf{Q}ueries を構築するために,LLMを用いた新しい手法を提案する。データセットを構築する上で重要な課題は、非構造化テキスト内の基盤となる論理構造をキャプチャすることにある。従って、サブグラフインジケータを用いてサブグラフ誘導プロンプトを設計することにより、選択されたパスに基づいて、特定の論理構造を持つクエリを生成するLLM(GPT-4oなど)をガイドする。 ComLQ の全てのクエリパスペアは、専門家アノテーションによって \emph{structure conformity} と \emph{evidence distribution} が保証される。ネゲーションでクエリを処理できるかどうかをよりよく評価するために、新しい評価指標である \textbf{Log-Scaled Negation Consistency} (\textbf{LSNC@$K$})を提案する。 nDCGやmAPのような標準の関連性ベースのメトリクスの補足として、LSNC@$K$は、検索された上位$K$がクエリの否定条件に違反しているかどうかを測定する。ゼロショット条件下での実験結果は、既存の検索モデルが複雑な論理的クエリ、特に否定のあるクエリに対して限られた性能を示し、モデリング排他性が劣ることを示した。

論文の概要: ComLQ: Benchmarking Complex Logical Queries in Information Retrieval

関連論文リスト