Coreference Resolution without Span Representations
- URL: http://arxiv.org/abs/2101.00434v1
- Date: Sat, 2 Jan 2021 11:46:51 GMT
- Title: Coreference Resolution without Span Representations
- Authors: Yuval Kirstain, Ori Ram, Omer Levy
- Abstract summary: We introduce a lightweight coreference model that removes the dependency on span representations, handcrafted features, and NLPs.
Our model performs competitively with the current end-to-end model, while being simpler and more efficient.
- Score: 20.84150608402576
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Since the introduction of deep pretrained language models, most task-specific
NLP models were reduced to simple lightweight layers. An exception to this
trend is the challenging task of coreference resolution, where a sophisticated
end-to-end model is appended to a pretrained transformer encoder. While highly
effective, the model has a very large memory footprint -- primarily due to
dynamically-constructed span and span-pair representations -- which hinders the
processing of complete documents and the ability to train on multiple instances
in a single batch. We introduce a lightweight coreference model that removes
the dependency on span representations, handcrafted features, and heuristics.
Our model performs competitively with the current end-to-end model, while being
simpler and more efficient.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Pre-Trained Model Recommendation for Downstream Fine-tuning [22.343011779348682]
Model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task.
Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks.
We present a pragmatic framework textbfFennec, delving into a diverse, large-scale model repository.
arXiv Detail & Related papers (2024-03-11T02:24:32Z) - Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones.
This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Transformer-based Models for Long-Form Document Matching: Challenges and
Empirical Analysis [12.269318291685753]
We show that simple neural models outperform the more complex BERT-based models.
Simple models are also more robust to variations in document length and text perturbations.
arXiv Detail & Related papers (2023-02-07T21:51:05Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z) - Train Large, Then Compress: Rethinking Model Size for Efficient Training
and Inference of Transformers [94.43313684188819]
We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute.
We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps.
This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models.
arXiv Detail & Related papers (2020-02-26T21:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.