論文の概要: A Large Scale Search Dataset for Unbiased Learning to Rank
- arxiv url: http://arxiv.org/abs/2207.03051v1
- Date: Thu, 7 Jul 2022 02:37:25 GMT
- ステータス: 処理完了
- システム内更新日: 2022-07-08 13:24:29.357874
- Title: A Large Scale Search Dataset for Unbiased Learning to Rank
- Title(参考訳): 非偏見学習のランク付けのための大規模検索データセット
- Authors: Lixin Zou, Haitao Mao, Xiaokai Chu, Jiliang Tang, Wenwen Ye,
Shuaiqiang Wang, Dawei Yin
- Abstract要約: 我々は、非バイアス学習のためのBaidu-ULTRデータセットをランク付けする。
- 参考スコア(独自算出の注目度): 51.97967284268577
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The unbiased learning to rank (ULTR) problem has been greatly advanced by
recent deep learning techniques and well-designed debias algorithms. However,
promising results on the existing benchmark datasets may not be extended to the
practical scenario due to the following disadvantages observed from those
popular benchmark datasets: (1) outdated semantic feature extraction where
state-of-the-art large scale pre-trained language models like BERT cannot be
exploited due to the missing of the original text;(2) incomplete display
features for in-depth study of ULTR, e.g., missing the displayed abstract of
documents for analyzing the click necessary bias; (3) lacking real-world user
feedback, leading to the prevalence of synthetic datasets in the empirical
study. To overcome the above disadvantages, we introduce the Baidu-ULTR
dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008
expert annotated queries, which is orders of magnitude larger than the existing
ones. Baidu-ULTR provides:(1) the original semantic feature and a pre-trained
language model for easy usage; (2) sufficient display information such as
position, displayed height, and displayed abstract, enabling the comprehensive
study of different biases with advanced techniques such as causal discovery and
meta-learning; and (3) rich user feedback on search result pages (SERPs) like
dwelling time, allowing for user engagement optimization and promoting the
exploration of multi-task learning in ULTR. In this paper, we present the
design principle of Baidu-ULTR and the performance of benchmark ULTR algorithms
on this new data resource, favoring the exploration of ranking for long-tail
queries and pre-training tasks for ranking. The Baidu-ULTR dataset and
corresponding baseline implementation are available at
- Abstract(参考訳): ultr(unbiased learning to rank)問題は、最近のディープラーニング技術とよく設計されたデビアスアルゴリズムによって大きく進歩した。
However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to the following disadvantages observed from those popular benchmark datasets: (1) outdated semantic feature extraction where state-of-the-art large scale pre-trained language models like BERT cannot be exploited due to the missing of the original text;(2) incomplete display features for in-depth study of ULTR, e.g., missing the displayed abstract of documents for analyzing the click necessary bias; (3) lacking real-world user feedback, leading to the prevalence of synthetic datasets in the empirical study.
Baidu-ULTR provides:(1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of different biases with advanced techniques such as causal discovery and meta-learning; and (3) rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR.
- Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract) [73.57710917145212]
本稿では,これらの課題に対処するために,経験的 UlineSemi-uline Supervised ulinePre-trained (GS2P) モデルを提案する。
論文 参考訳(メタデータ) (2024-09-25T03:39:14Z) - Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank [26.69630281310365]
Unbiased Learning to Rank (ULTR)は、バイアスのないユーザのフィードバック(例えばクリック)を活用して、バイアスのないランキングモデルを最適化することを目的としている。
位置バイアスと文脈バイアスの両方に対処するため,CDLA-LD(Contextual Dual Learning Algorithm)を提案する。
論文 参考訳(メタデータ) (2024-08-19T09:13:52Z) - Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI [3.9773527114058855]
開発したGTR(Generative Text Retrieval)は,非構造化データと構造化データの両方に適用可能である。
改良されたモデルであるGenerative Tabular Text Retrieval (GTR-T) は、大規模データベースクエリの効率を実証した。
論文 参考訳(メタデータ) (2024-06-13T23:08:06Z) - Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset [48.708591046906896]
Unbiased Learning-to-rank(ULTR)は、ユーザクリックから学習するための確立したフレームワークである。
標準的な非バイアスの学習 to ランク技術は、クリック予測を堅牢に改善するが、ランク付け性能を一貫して改善するのに苦労している。
論文 参考訳(メタデータ) (2024-04-03T08:00:46Z) - Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
論文 参考訳(メタデータ) (2023-02-09T06:46:42Z) - ULTRA: An Unbiased Learning To Rank Algorithm Toolbox [13.296248894004652]
本稿では,Unbiased Learning to rank (ULTR)の一般的な枠組みについて述べる。
論文 参考訳(メタデータ) (2021-08-11T07:26:59Z) - Relation-Guided Representation Learning [53.60351496449232]
論文 参考訳(メタデータ) (2020-07-11T10:57:45Z) - ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
論文 参考訳(メタデータ) (2020-02-11T11:54:29Z)