Overview of the TREC 2021 deep learning track
- URL: http://arxiv.org/abs/2507.08191v1
- Date: Thu, 10 Jul 2025 21:58:41 GMT
- Title: Overview of the TREC 2021 deep learning track
- Authors: Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin,
- Abstract summary: This is the third year of the TREC Deep Learning track.<n>We leverage the MS MARCO datasets that made hundreds of thousands of human annotated training labels available for both passage and document ranking tasks.<n>Deep neural ranking models that employ large scale pretraininig continued to outperform traditional retrieval methods this year.
- Score: 68.66107744993546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This is the third year of the TREC Deep Learning track. As in previous years, we leverage the MS MARCO datasets that made hundreds of thousands of human annotated training labels available for both passage and document ranking tasks. In addition, this year we refreshed both the document and the passage collections which also led to a nearly four times increase in the document collection size and nearly $16$ times increase in the size of the passage collection. Deep neural ranking models that employ large scale pretraininig continued to outperform traditional retrieval methods this year. We also found that single stage retrieval can achieve good performance on both tasks although they still do not perform at par with multistage retrieval pipelines. Finally, the increase in the collection size and the general data refresh raised some questions about completeness of NIST judgments and the quality of the training labels that were mapped to the new collections from the old ones which we discuss in this report.
Related papers
- Overview of the TREC 2022 deep learning track [67.86242254073656]
This is the fourth year of the TREC Deep Learning track.<n>We leverage the MS MARCO datasets that made hundreds of thousands of human annotated training labels available.<n>Similar to previous years, deep neural ranking models that employ large scale pretraining continued to outperform traditional retrieval methods.
arXiv Detail & Related papers (2025-07-10T20:48:22Z) - Towards Efficient Active Learning in NLP via Pretrained Representations [1.90365714903665]
Fine-tuning Large Language Models (LLMs) is now a common approach for text classification in a wide range of applications.
We drastically expedite this process by using pretrained representations of LLMs within the active learning loop.
Our strategy yields similar performance to fine-tuning all the way through the active learning loop but is orders of magnitude less computationally expensive.
arXiv Detail & Related papers (2024-02-23T21:28:59Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking [20.260222175405215]
This paper describes the PASH participation in TREC 2021 Deep Learning Track.
In the recall stage, we adopt a scheme combining sparse and dense retrieval method.
In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used.
arXiv Detail & Related papers (2022-05-18T04:38:15Z) - Overview of the TREC 2020 deep learning track [30.531644711518414]
This year we have a document retrieval task and a passage retrieval task, each with hundreds of thousands of human-labeled training queries.
We evaluate using single-shot TREC-style evaluation, to give us a picture of which ranking methods work best when large data is available.
This year we have further evidence that rankers with BERT-style pretraining outperform other rankers in the large data regime.
arXiv Detail & Related papers (2021-02-15T16:47:00Z) - SupMMD: A Sentence Importance Model for Extractive Summarization using
Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing.
We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z) - Overview of the TREC 2019 deep learning track [36.23357487158591]
The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.
It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks.
This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods.
arXiv Detail & Related papers (2020-03-17T17:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.