From Easy to Hard: A Dual Curriculum Learning Framework for
Context-Aware Document Ranking
- URL: http://arxiv.org/abs/2208.10226v1
- Date: Mon, 22 Aug 2022 12:09:12 GMT
- Title: From Easy to Hard: A Dual Curriculum Learning Framework for
Context-Aware Document Ranking
- Authors: Yutao Zhu, Jian-Yun Nie, Yixuan Su, Haonan Chen, Xinyu Zhang, Zhicheng
Dou
- Abstract summary: We propose a curriculum learning framework for context-aware document ranking.
We aim to guide the model gradually toward a global optimum.
Experiments on two real query log datasets show that our proposed framework can improve the performance of several existing methods significantly.
- Score: 41.8396866002968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual information in search sessions is important for capturing users'
search intents. Various approaches have been proposed to model user behavior
sequences to improve document ranking in a session. Typically, training samples
of (search context, document) pairs are sampled randomly in each training
epoch. In reality, the difficulty to understand user's search intent and to
judge document's relevance varies greatly from one search context to another.
Mixing up training samples of different difficulties may confuse the model's
optimization process. In this work, we propose a curriculum learning framework
for context-aware document ranking, in which the ranking model learns matching
signals between the search context and the candidate document in an
easy-to-hard manner. In so doing, we aim to guide the model gradually toward a
global optimum. To leverage both positive and negative examples, two curricula
are designed. Experiments on two real query log datasets show that our proposed
framework can improve the performance of several existing methods
significantly, demonstrating the effectiveness of curriculum learning for
context-aware document ranking.
Related papers
- Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Improving Topic Relevance Model by Mix-structured Summarization and LLM-based Data Augmentation [16.170841777591345]
In most social search scenarios such as Dianping, modeling search relevance always faces two challenges.
We first take queryd with the query-based summary and the document summary without query as the input of topic relevance model.
Then, we utilize the language understanding and generation abilities of large language model (LLM) to rewrite and generate query from queries and documents in existing training data.
arXiv Detail & Related papers (2024-04-03T10:05:47Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - SPM: Structured Pretraining and Matching Architectures for Relevance
Modeling in Meituan Search [12.244685291395093]
In e-commerce search, relevance between query and documents is an essential requirement for satisfying user experience.
We propose a novel two-stage pretraining and matching architecture for relevance matching with rich structured documents.
The model has already been deployed online, serving the search traffic of Meituan for over a year.
arXiv Detail & Related papers (2023-08-15T11:45:34Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - CODER: An efficient framework for improving retrieval through
COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost.
It utilizes precomputed document representations extracted by a base dense retrieval method.
It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.