Related papers: Lite Unified Modeling for Discriminative Reading Comprehension

Lite Unified Modeling for Discriminative Reading Comprehension

URL: http://arxiv.org/abs/2203.14103v1
Date: Sat, 26 Mar 2022 15:47:19 GMT
Title: Lite Unified Modeling for Discriminative Reading Comprehension
Authors: Yilin Zhao and Hai Zhao and Libin Shen and Yinggong Zhao
Abstract summary: We propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) to handle diverse discriminative MRC tasks synchronously. Our lite unified design brings model significant improvement with both encoder and decoder components. The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model.
Score: 68.39862736200045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a broad and major category in machine reading comprehension (MRC), the generalized goal of discriminative MRC is answer prediction from the given materials. However, the focuses of various discriminative MRC tasks may be diverse enough: multi-choice MRC requires model to highlight and integrate all potential critical evidence globally; while extractive MRC focuses on higher local boundary preciseness for answer extraction. Among previous works, there lacks a unified design with pertinence for the overall discriminative MRC tasks. To fill in above gap, we propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) as the first attempt of unified modeling with pertinence, to handle diverse discriminative MRC tasks synchronously. Nearly without introducing more parameters, our lite unified design brings model significant improvement with both encoder and decoder components. The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model, and the code is available at https://github.com/Yilin1111/poi-net.

Related papers

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning [12.724393910603299]
We introduce the Generative Multimodal Process Reward Model (GM-PRM)<n>Instead of a simple scalar score, GM-PRM provides a fine-grained, interpretable analysis of each reasoning step.<n>We show that GM-PRM achieves state-of-the-art results on multiple multimodal math benchmarks.
arXiv Detail & Related papers (2025-08-06T05:10:29Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
Reinforcing Video Reasoning with Focused Thinking [65.85683941058916]
We propose TW-GRPO, a novel framework that enhances visual reasoning with focused thinking and dense reward granularity.<n>Specifically, we employ a token weighting mechanism that prioritizes tokens with high informational density.<n>We also reformulate RL training by shifting from single-choice to multi-choice QA tasks.
arXiv Detail & Related papers (2025-05-30T15:42:19Z)
MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark [51.73839215956791]
We introduce a novel taxonomy that categorizes the key capabilities required for reading comprehension (RC) Based on this taxonomy, we construct MRCEval, an MRC benchmark that leverages advanced Large Language Models (LLMs) as sample generators and selection judges. MRCEval is a comprehensive, challenging and accessible benchmark, covering 13 distinct RC skills with a total of 2.1K high-quality multi-choice questions.
arXiv Detail & Related papers (2025-03-10T10:20:05Z)
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval [13.59418209417664]
Zero-Shot Composed Image Retrieval (ZS-CIR) aims to retrieve target images by integrating information from a composed query without training samples. We propose CoTMR, a training-free framework crafted for ZS-CIR with novel Chain-of-thought (CoT) and Multi-scale Reasoning.
arXiv Detail & Related papers (2025-02-28T08:12:23Z)
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models [74.40683913645731]
Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications. Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth. Analysis of these prompt scores reveals VLM biases and AND''/OR' signal ambiguities, notably that maximum scores are surprisingly suboptimal compared to second-highest scores.
arXiv Detail & Related papers (2025-02-24T07:15:05Z)
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models [34.27380518351181]
We introduce Robust Adapter (R-Adapter), a novel method for fine-tuning zero-shot models to downstream tasks. Our method integrates lightweight modules into the pre-trained model and employs novel self-ensemble techniques to boost OOD robustness and reduce storage expenses substantially. Our experiments demonstrate that R-Adapter achieves state-of-the-art performance across a diverse set of tasks, tuning only 13% of the parameters of the CLIP encoders.
arXiv Detail & Related papers (2024-08-11T11:37:43Z)
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation [7.797154022794006]
Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches. We propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models. Our method achieves state-of-the-art performance while reducing the model parameters by 60%.
arXiv Detail & Related papers (2024-07-16T03:19:59Z)
LLM4Rerank: LLM-based Auto-Reranking Framework for Recommendations [51.76373105981212]
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. We introduce a comprehensive reranking framework, designed to seamlessly integrate various reranking criteria. A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs.
arXiv Detail & Related papers (2024-06-18T09:29:18Z)
Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time. Our training process is extremely simple, integrating multi-label classification loss with a routing function. Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning [9.176056742068813]
We present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models.
arXiv Detail & Related papers (2022-06-03T17:09:26Z)
Robust Object Detection with Multi-input Multi-output Faster R-CNN [2.9823962001574182]
This work applies the multi-input multi-output architecture (MIMO) to the task of object detection using the general-purpose Faster R-CNN model. MIMO allows building strong feature representation and obtains very competitive accuracy when using just two input/output pairs. It also adds just 0.5% additional model parameters and increases the inference time by 15.9% when compared to the standard Faster R-CNN.
arXiv Detail & Related papers (2021-11-25T12:59:34Z)
Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought. We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning. This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z)
Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting. HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information. Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.