NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons
Learned
- URL: http://arxiv.org/abs/2101.00133v1
- Date: Fri, 1 Jan 2021 01:24:34 GMT
- Title: NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons
Learned
- Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi,
Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria
Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang
Wu, Heinrich K\"uttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp,
Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni,
Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka,
Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki,
Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong
Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz,
Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael
Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
- Abstract summary: We describe the motivation and organization of the EfficientQA competition from NeurIPS 2020.
The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers.
- Score: 122.429985063391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition
focused on open-domain question answering (QA), where systems take natural
language questions as input and return natural language answers. The aim of the
competition was to build systems that can predict correct answers while also
satisfying strict on-disk memory budgets. These memory budgets were designed to
encourage contestants to explore the trade-off between storing large,
redundant, retrieval corpora or the parameters of large learned models. In this
report, we describe the motivation and organization of the competition, review
the best submissions, and analyze system predictions to inform a discussion of
evaluation for open-domain QA.
Related papers
- Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora.
High-quality datasets are used to train models on realistic scenarios.
Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Benchmarking Robustness and Generalization in Multi-Agent Systems: A
Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions.
This competition targets robustness and generalization in multi-agent systems.
We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z) - ICDAR 2023 Competition on Hierarchical Text Detection and Recognition [60.68100769639923]
The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition.
We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule.
During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks.
arXiv Detail & Related papers (2023-05-16T18:56:12Z) - Evaluation of Question Answering Systems: Complexity of judging a
natural language [3.4771957347698583]
Question answering (QA) systems are among the most important and rapidly developing research topics in natural language processing (NLP)
This survey attempts to provide a systematic overview of the general framework of QA, QA paradigms, benchmark datasets, and assessment techniques for a quantitative evaluation of QA systems.
arXiv Detail & Related papers (2022-09-10T12:29:04Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z) - A Clarifying Question Selection System from NTES_ALONG in Convai3
Challenge [8.656503175492375]
This paper presents the participation of NetEase Game AI Lab team for the ClariQ challenge at Search-oriented Conversational AI (SCAI) EMNLP workshop in 2020.
The challenge asks for a complete conversational information retrieval system that can understanding and generating clarification questions.
We propose a clarifying question selection system which consists of response understanding, candidate question recalling and clarifying question ranking.
arXiv Detail & Related papers (2020-10-27T11:22:53Z) - An Interpretable Deep Learning System for Automatically Scoring Request
for Proposals [3.244940746423378]
We propose a novel Bi-LSTM based regression model, and provide deeper insight into phrases which latently impact scoring of responses.
We also qualitatively asses the impact of important phrases using human evaluators.
Finally, we introduce a novel problem statement that can be used to further improve the state of the art in NLP based automatic scoring systems.
arXiv Detail & Related papers (2020-08-05T20:21:35Z) - Analysing Affective Behavior in the First ABAW 2020 Competition [49.90617840789334]
The Affective Behavior Analysis in-the-wild (ABAW) 2020 Competition is the first Competition aiming at automatic analysis of the three main behavior tasks.
We describe this Competition, to be held in conjunction with the IEEE Conference on Face and Gesture Recognition, May 2020, in Buenos Aires, Argentina.
We outline the evaluation metrics, present both the baseline system and the top-3 performing teams' methodologies per Challenge and finally present their obtained results.
arXiv Detail & Related papers (2020-01-30T15:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.