End-to-end Neural Coreference Resolution Revisited: A Simple yet
Effective Baseline
- URL: http://arxiv.org/abs/2107.01700v1
- Date: Sun, 4 Jul 2021 18:12:24 GMT
- Title: End-to-end Neural Coreference Resolution Revisited: A Simple yet
Effective Baseline
- Authors: Tuan Manh Lai, Trung Bui, Doo Soon Kim
- Abstract summary: We propose a simple yet effective baseline for coreference resolution.
Our model is a simplified version of the original neural coreference resolution model.
Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models.
- Score: 20.431647446999996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the first end-to-end neural coreference resolution model was
introduced, many extensions to the model have been proposed, ranging from using
higher-order inference to directly optimizing evaluation metrics using
reinforcement learning. Despite improving the coreference resolution
performance by a large margin, these extensions add a lot of extra complexity
to the original model. Motivated by this observation and the recent advances in
pre-trained Transformer language models, we propose a simple yet effective
baseline for coreference resolution. Our model is a simplified version of the
original neural coreference resolution model, however, it achieves impressive
performance, outperforming all recent extended works on the public English
OntoNotes benchmark. Our work provides evidence for the necessity of carefully
justifying the complexity of existing or newly proposed models, as introducing
a conceptual or practical simplification to an existing model can still yield
competitive results.
Related papers
- PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking [0.0]
PRefLexOR combines preference optimization with concepts from Reinforcement Learning to enable models to self-teach.
We focus on applications in biological materials science and demonstrate the method in a variety of case studies.
arXiv Detail & Related papers (2024-10-16T08:46:26Z) - Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance [27.91782770050068]
Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data.
Recent works have confirmed that additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, but compromises the model's robustness against distribution shifts.
We propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models.
arXiv Detail & Related papers (2024-04-01T02:01:33Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Rethinking Self-Supervision Objectives for Generalizable Coherence
Modeling [8.329870357145927]
Coherence evaluation of machine generated text is one of the principal applications of coherence models that needs to be investigated.
We explore training data and self-supervision objectives that result in a model that generalizes well across tasks.
We show empirically that increasing the density of negative samples improves the basic model, and using a global negative queue further improves and stabilizes the model while training with hard negative samples.
arXiv Detail & Related papers (2021-10-14T07:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.