TREC Deep Learning Track: Reusable Test Collections in the Large Data
Regime
- URL: http://arxiv.org/abs/2104.09399v1
- Date: Mon, 19 Apr 2021 15:41:28 GMT
- Title: TREC Deep Learning Track: Reusable Test Collections in the Large Data
Regime
- Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M.
Voorhees and Ian Soboroff
- Abstract summary: This paper supports the reuse of the TREC DL test collections in three ways.
First we describe the data sets in detail, documenting clearly and in one place some details that are otherwise scattered in track guidelines.
Second, because there is some risk of iteration and selection bias when reusing a data set, we describe the best practices for writing a paper using TREC DL data, without overfitting.
- Score: 33.202007333667375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The TREC Deep Learning (DL) Track studies ad hoc search in the large data
regime, meaning that a large set of human-labeled training data is available.
Results so far indicate that the best models with large data may be deep neural
networks. This paper supports the reuse of the TREC DL test collections in
three ways. First we describe the data sets in detail, documenting clearly and
in one place some details that are otherwise scattered in track guidelines,
overview papers and in our associated MS MARCO leaderboard pages. We intend
this description to make it easy for newcomers to use the TREC DL data. Second,
because there is some risk of iteration and selection bias when reusing a data
set, we describe the best practices for writing a paper using TREC DL data,
without overfitting. We provide some illustrative analysis. Finally we address
a number of issues around the TREC DL data, including an analysis of
reusability.
Related papers
- DRUPI: Dataset Reduction Using Privileged Information [20.59889438709671]
dataset reduction (DR) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks.
We introduce dataset Reduction Using Privileged Information (DRUPI), which enriches DR by synthesizing privileged information alongside the reduced dataset.
Our findings reveal that effective feature labels must balance between being overly discriminative and excessively diverse, with a moderate level proving optimal for improving the reduced dataset's efficacy.
arXiv Detail & Related papers (2024-10-02T14:49:05Z) - Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models [29.735976068474105]
We propose soft prompt tuning for augmenting Dense retrieval (DR) models.
For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data.
We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries.
arXiv Detail & Related papers (2023-07-17T07:55:47Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Data leakage in cross-modal retrieval training: A case study [16.18916188804986]
We study the recently proposed SoundDesc benchmark dataset, which was automatically sourced from the BBC Sound Effects web page.
We find that SoundDesc contains several duplicates that cause leakage of training data to the evaluation data.
We propose new training, validation, and testing splits for the dataset that we make available online.
arXiv Detail & Related papers (2023-02-23T09:51:03Z) - Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Analyzing Dynamic Adversarial Training Data in the Limit [50.00850852546616]
Dynamic adversarial data collection (DADC) holds promise as an approach for generating such diverse training sets.
We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs.
Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data.
arXiv Detail & Related papers (2021-10-16T08:48:52Z) - Boosting offline handwritten text recognition in historical documents
with few labeled lines [5.9207487081080705]
We analyze how to perform transfer learning from a massive database to a smaller historical database.
Second, we analyze methods to efficiently combine TL and data augmentation.
An algorithm to mitigate the effects of incorrect labelings in the training set is proposed.
arXiv Detail & Related papers (2020-12-04T11:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.