A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and
Benchmark Datasets
- URL: http://arxiv.org/abs/2006.11880v2
- Date: Wed, 21 Oct 2020 04:19:14 GMT
- Title: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and
Benchmark Datasets
- Authors: Changchang Zeng, Shaobo Li, Qin Li, Jie Hu, and Jianjun Hu
- Abstract summary: Machine Reading (MRC) is a challenging Natural Language Processing(NLP) research field with wide real-world applications.
A lot of MRC models have already surpassed human performance on various benchmark datasets.
This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward "real" understanding.
- Score: 5.54205518616467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Reading Comprehension (MRC) is a challenging Natural Language
Processing(NLP) research field with wide real-world applications. The great
progress of this field in recent years is mainly due to the emergence of
large-scale datasets and deep learning. At present, a lot of MRC models have
already surpassed human performance on various benchmark datasets despite the
obvious giant gap between existing MRC models and genuine human-level reading
comprehension. This shows the need for improving existing datasets, evaluation
metrics, and models to move current MRC models toward "real" understanding. To
address the current lack of comprehensive survey of existing MRC tasks,
evaluation metrics, and datasets, herein, (1) we analyze 57 MRC tasks and
datasets and propose a more precise classification method of MRC tasks with 4
different attributes; (2) we summarized 9 evaluation metrics of MRC tasks, 7
attributes and 10 characteristics of MRC datasets; (3) We also discuss key open
issues in MRC research and highlighted future research directions. In addition,
we have collected, organized, and published our data on the companion
website(https://mrc-datasets.github.io/) where MRC researchers could directly
access each MRC dataset, papers, baseline projects, and the leaderboard.
Related papers
- IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios [14.336896748878921]
This paper introduces the IRSC benchmark for evaluating the performance of embedding models in multilingual RAG tasks.
The benchmark encompasses five retrieval tasks: query retrieval, title retrieval, part-of-paragraph retrieval, keyword retrieval, and summary retrieval.
Our contributions include: 1) the IRSC benchmark, 2) the SSCI and RCCI metrics, and 3) insights into the cross-lingual limitations of embedding models.
arXiv Detail & Related papers (2024-09-24T05:39:53Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Datasets for Large Language Models: A Comprehensive Survey [37.153302283062004]
The survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives.
The survey sheds light on the prevailing challenges and points out potential avenues for future investigation.
The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets.
arXiv Detail & Related papers (2024-02-28T04:35:51Z) - NER-to-MRC: Named-Entity Recognition Completely Solving as Machine
Reading Comprehension [29.227500985892195]
We frame NER as a machine reading comprehension problem, called NER-to-MRC.
We transform the NER task into a form suitable for the model to solve with MRC in a efficient manner.
We achieve state-of-the-art performance without external data, up to 11.24% improvement on the WNUT-16 dataset.
arXiv Detail & Related papers (2023-05-06T08:05:22Z) - A Comprehensive Survey on Multi-hop Machine Reading Comprehension
Datasets and Metrics [0.0]
Multi-hop Machine reading comprehension is a challenging task with aim of answering a question based on disjoint pieces of information.
The evaluation metrics and datasets are a vital part of multi-hop MRC because it is not possible to train and evaluate models without them.
This study aims to present a comprehensive survey on recent advances in multi-hop MRC evaluation metrics and datasets.
arXiv Detail & Related papers (2022-12-08T04:42:59Z) - Lite Unified Modeling for Discriminative Reading Comprehension [68.39862736200045]
We propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) to handle diverse discriminative MRC tasks synchronously.
Our lite unified design brings model significant improvement with both encoder and decoder components.
The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model.
arXiv Detail & Related papers (2022-03-26T15:47:19Z) - ExpMRC: Explainability Evaluation for Machine Reading Comprehension [42.483940360860096]
We propose a new benchmark called ExpMRC for evaluating the explainability of the Machine Reading systems.
We use state-of-the-art pre-trained language models to build baseline systems and adopt various unsupervised approaches to extract evidence without a human-annotated training set.
arXiv Detail & Related papers (2021-05-10T06:00:20Z) - Self-Teaching Machines to Read and Comprehend with Large-Scale
Multi-Subject Question Answering Data [58.36305373100518]
It is unclear whether subject-area question-answering data is useful for machine reading comprehension tasks.
We collect a large-scale multi-subject multiple-choice question-answering dataset, ExamQA.
We use incomplete and noisy snippets returned by a web search engine as the relevant context for each question-answering instance to convert it into a weakly-labeled MRC instance.
arXiv Detail & Related papers (2021-02-01T23:18:58Z) - Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought.
We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning.
This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z) - Machine Reading Comprehension: The Role of Contextualized Language
Models and Beyond [85.53037880415734]
Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages.
With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs.
arXiv Detail & Related papers (2020-05-13T10:58:50Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.