Lite Unified Modeling for Discriminative Reading Comprehension
- URL: http://arxiv.org/abs/2203.14103v1
- Date: Sat, 26 Mar 2022 15:47:19 GMT
- Title: Lite Unified Modeling for Discriminative Reading Comprehension
- Authors: Yilin Zhao and Hai Zhao and Libin Shen and Yinggong Zhao
- Abstract summary: We propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) to handle diverse discriminative MRC tasks synchronously.
Our lite unified design brings model significant improvement with both encoder and decoder components.
The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model.
- Score: 68.39862736200045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a broad and major category in machine reading comprehension (MRC), the
generalized goal of discriminative MRC is answer prediction from the given
materials. However, the focuses of various discriminative MRC tasks may be
diverse enough: multi-choice MRC requires model to highlight and integrate all
potential critical evidence globally; while extractive MRC focuses on higher
local boundary preciseness for answer extraction. Among previous works, there
lacks a unified design with pertinence for the overall discriminative MRC
tasks. To fill in above gap, we propose a lightweight POS-Enhanced Iterative
Co-Attention Network (POI-Net) as the first attempt of unified modeling with
pertinence, to handle diverse discriminative MRC tasks synchronously. Nearly
without introducing more parameters, our lite unified design brings model
significant improvement with both encoder and decoder components. The
evaluation results on four discriminative MRC benchmarks consistently indicate
the general effectiveness and applicability of our model, and the code is
available at https://github.com/Yilin1111/poi-net.
Related papers
- Efficient and Versatile Robust Fine-Tuning of Zero-shot Models [34.27380518351181]
We introduce Robust Adapter (R-Adapter), a novel method for fine-tuning zero-shot models to downstream tasks.
Our method integrates lightweight modules into the pre-trained model and employs novel self-ensemble techniques to boost OOD robustness and reduce storage expenses substantially.
Our experiments demonstrate that R-Adapter achieves state-of-the-art performance across a diverse set of tasks, tuning only 13% of the parameters of the CLIP encoders.
arXiv Detail & Related papers (2024-08-11T11:37:43Z) - Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation [7.797154022794006]
Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches.
We propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models.
Our method achieves state-of-the-art performance while reducing the model parameters by 60%.
arXiv Detail & Related papers (2024-07-16T03:19:59Z) - Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Dynamic Kernel Selection for Improved Generalization and Memory
Efficiency in Meta-learning [9.176056742068813]
We present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models.
Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task.
We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models.
arXiv Detail & Related papers (2022-06-03T17:09:26Z) - Robust Object Detection with Multi-input Multi-output Faster R-CNN [2.9823962001574182]
This work applies the multi-input multi-output architecture (MIMO) to the task of object detection using the general-purpose Faster R-CNN model.
MIMO allows building strong feature representation and obtains very competitive accuracy when using just two input/output pairs.
It also adds just 0.5% additional model parameters and increases the inference time by 15.9% when compared to the standard Faster R-CNN.
arXiv Detail & Related papers (2021-11-25T12:59:34Z) - Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought.
We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning.
This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.