Related papers: A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution

A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution

URL: http://arxiv.org/abs/2009.12721v1
Date: Sun, 27 Sep 2020 01:40:01 GMT
Title: A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution
Authors: Hongming Zhang, Xinran Zhao, Yangqiu Song
Abstract summary: Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
Score: 55.39835612617972
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to. Compared with the general coreference resolution task, the main challenge of PCR is the coreference relation prediction rather than the mention detection. As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models, which motivates us to survey existing approaches and think about how to do better. In this survey, we first introduce representative datasets and models for the ordinary pronoun coreference resolution task. Then we focus on recent progress on hard pronoun coreference resolution problems (e.g., Winograd Schema Challenge) to analyze how well current models can understand commonsense. We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications (e.g., all SOTA models struggle on correctly resolving pronouns to infrequent objects). All experiment codes are available at https://github.com/HKUST-KnowComp/PCR.

Related papers

Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution [21.19369044026899]
We show that despite the strong performance of prompted language models (LMs) on the Winograd Challenge set, these same modeling techniques perform relatively poorly at resolving certain pronominal ambiguities attested in OntoNotes. We propose a method for ensembling a prompted LM with a supervised, task-specific system that is overall more accurate at resolving pronominal coreference across datasets.
arXiv Detail & Related papers (2024-10-12T09:04:53Z)
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models [8.7734602595507]
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) We modified standardized test questions by replacing a key term with a dummy word along with its definition. We found a substantial reduction in model performance after such replacement, suggesting poor comprehension.
arXiv Detail & Related papers (2024-06-15T05:35:47Z)
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models [13.532180752491954]
We demonstrate a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales. The breakdown is dramatic, as models show strong fluctuations across even slight problem variations that should not affect problem solving. We take these initial observations to stimulate urgent re-assessment of the claimed capabilities of current generation of Large Language Models.
arXiv Detail & Related papers (2024-06-04T07:43:33Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
A Survey on Zero Pronoun Translation [69.09774294082965]
Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages. This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution. We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use.
arXiv Detail & Related papers (2023-05-17T13:19:01Z)
Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge [60.616313552585645]
We present models for effective Ambiguity Detection and Coreference Resolution in Conversational AI. Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component.
arXiv Detail & Related papers (2022-02-25T12:10:02Z)
Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought. We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning. This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z)
A Rigorous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? [44.87003366511073]
Fine-tuning pretrained model has achieved promising performance on standard NER benchmarks. Unfortunately, when scaling NER to open situations, these advantages may no longer exist. This paper proposes to conduct randomization test on standard benchmarks.
arXiv Detail & Related papers (2020-04-25T12:30:16Z)
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
We introduce a new Reading dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. In this paper, we propose to identify biased data points and separate them into EASY set and the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
arXiv Detail & Related papers (2020-02-11T11:54:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.