End-to-end Neural Coreference Resolution Revisited: A Simple yet
Effective Baseline
- URL: http://arxiv.org/abs/2107.01700v1
- Date: Sun, 4 Jul 2021 18:12:24 GMT
- Title: End-to-end Neural Coreference Resolution Revisited: A Simple yet
Effective Baseline
- Authors: Tuan Manh Lai, Trung Bui, Doo Soon Kim
- Abstract summary: We propose a simple yet effective baseline for coreference resolution.
Our model is a simplified version of the original neural coreference resolution model.
Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models.
- Score: 20.431647446999996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the first end-to-end neural coreference resolution model was
introduced, many extensions to the model have been proposed, ranging from using
higher-order inference to directly optimizing evaluation metrics using
reinforcement learning. Despite improving the coreference resolution
performance by a large margin, these extensions add a lot of extra complexity
to the original model. Motivated by this observation and the recent advances in
pre-trained Transformer language models, we propose a simple yet effective
baseline for coreference resolution. Our model is a simplified version of the
original neural coreference resolution model, however, it achieves impressive
performance, outperforming all recent extended works on the public English
OntoNotes benchmark. Our work provides evidence for the necessity of carefully
justifying the complexity of existing or newly proposed models, as introducing
a conceptual or practical simplification to an existing model can still yield
competitive results.
Related papers
- LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging [10.33844295243509]
We propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named textscLoRE-Merging.
Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference.
arXiv Detail & Related papers (2025-02-15T10:18:46Z) - Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.
Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.
We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Diversify, Contextualize, and Adapt: Efficient Entropy Modeling for Neural Image Codec [11.078070771578837]
More efficient backward adaptation-based entropy models have been recently developed.
We argue that their performance has been limited by the simple adoption of the design convention for forward adaptation.
We propose a simple yet effective entropy modeling framework that leverages sufficient contexts for forward adaptation without compromising on bit-rate.
arXiv Detail & Related papers (2024-11-06T04:30:04Z) - Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance [27.91782770050068]
Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data.
Recent works have confirmed that additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, but compromises the model's robustness against distribution shifts.
We propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models.
arXiv Detail & Related papers (2024-04-01T02:01:33Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Rethinking Self-Supervision Objectives for Generalizable Coherence
Modeling [8.329870357145927]
Coherence evaluation of machine generated text is one of the principal applications of coherence models that needs to be investigated.
We explore training data and self-supervision objectives that result in a model that generalizes well across tasks.
We show empirically that increasing the density of negative samples improves the basic model, and using a global negative queue further improves and stabilizes the model while training with hard negative samples.
arXiv Detail & Related papers (2021-10-14T07:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.