Lite Unified Modeling for Discriminative Reading Comprehension
- URL: http://arxiv.org/abs/2203.14103v1
- Date: Sat, 26 Mar 2022 15:47:19 GMT
- Title: Lite Unified Modeling for Discriminative Reading Comprehension
- Authors: Yilin Zhao and Hai Zhao and Libin Shen and Yinggong Zhao
- Abstract summary: We propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) to handle diverse discriminative MRC tasks synchronously.
Our lite unified design brings model significant improvement with both encoder and decoder components.
The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model.
- Score: 68.39862736200045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a broad and major category in machine reading comprehension (MRC), the
generalized goal of discriminative MRC is answer prediction from the given
materials. However, the focuses of various discriminative MRC tasks may be
diverse enough: multi-choice MRC requires model to highlight and integrate all
potential critical evidence globally; while extractive MRC focuses on higher
local boundary preciseness for answer extraction. Among previous works, there
lacks a unified design with pertinence for the overall discriminative MRC
tasks. To fill in above gap, we propose a lightweight POS-Enhanced Iterative
Co-Attention Network (POI-Net) as the first attempt of unified modeling with
pertinence, to handle diverse discriminative MRC tasks synchronously. Nearly
without introducing more parameters, our lite unified design brings model
significant improvement with both encoder and decoder components. The
evaluation results on four discriminative MRC benchmarks consistently indicate
the general effectiveness and applicability of our model, and the code is
available at https://github.com/Yilin1111/poi-net.
Related papers
- Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels [64.94853276821992]
Large multimodal models (LMMs) are increasingly deployed across diverse applications.
Traditional evaluation methods are largely dataset-centric, relying on fixed, labeled datasets and supervised metrics.
We explore unsupervised model ranking for LMMs by leveraging their uncertainty signals, such as softmax probabilities.
arXiv Detail & Related papers (2024-12-09T13:05:43Z) - RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering [23.699493284403967]
This paper proposes RS-MoE, a first Mixture of Expert based VLM specifically customized for remote sensing domain.
Unlike traditional MoE models, the core of RS-MoE is the MoE Block, which incorporates a novel Instruction Router and multiple lightweight Large Language Models (LLMs) as expert models.
We show that our model achieves state-of-the-art performance in generating precise and contextually relevant captions.
arXiv Detail & Related papers (2024-11-03T15:05:49Z) - Efficient and Versatile Robust Fine-Tuning of Zero-shot Models [34.27380518351181]
We introduce Robust Adapter (R-Adapter), a novel method for fine-tuning zero-shot models to downstream tasks.
Our method integrates lightweight modules into the pre-trained model and employs novel self-ensemble techniques to boost OOD robustness and reduce storage expenses substantially.
Our experiments demonstrate that R-Adapter achieves state-of-the-art performance across a diverse set of tasks, tuning only 13% of the parameters of the CLIP encoders.
arXiv Detail & Related papers (2024-08-11T11:37:43Z) - Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation [7.797154022794006]
Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches.
We propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models.
Our method achieves state-of-the-art performance while reducing the model parameters by 60%.
arXiv Detail & Related papers (2024-07-16T03:19:59Z) - LLM4Rerank: LLM-based Auto-Reranking Framework for Recommendations [51.76373105981212]
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms.
We introduce a comprehensive reranking framework, designed to seamlessly integrate various reranking criteria.
A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs.
arXiv Detail & Related papers (2024-06-18T09:29:18Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Dynamic Kernel Selection for Improved Generalization and Memory
Efficiency in Meta-learning [9.176056742068813]
We present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models.
Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task.
We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models.
arXiv Detail & Related papers (2022-06-03T17:09:26Z) - Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought.
We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning.
This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.