Related papers: DUMA: Reading Comprehension with Transposition Thinking

DUMA: Reading Comprehension with Transposition Thinking

URL: http://arxiv.org/abs/2001.09415v5
Date: Tue, 15 Sep 2020 07:16:15 GMT
Title: DUMA: Reading Comprehension with Transposition Thinking
Authors: Pengfei Zhu and Hai Zhao and Xiaoguang Li
Abstract summary: Multi-choice Machine Reading (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question. New DUal Multi-head Co-Attention (DUMA) model is inspired by human's transposition thinking process solving the multi-choice MRC problem.
Score: 107.89721765056281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-choice Machine Reading Comprehension (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question. Thus in addition to a powerful Pre-trained Language Model (PrLM) as encoder, multi-choice MRC especially relies on a matching network design which is supposed to effectively capture the relationships among the triplet of passage, question and answers. While the newer and more powerful PrLMs have shown their mightiness even without the support from a matching network, we propose a new DUal Multi-head Co-Attention (DUMA) model, which is inspired by human's transposition thinking process solving the multi-choice MRC problem: respectively considering each other's focus from the standpoint of passage and question. The proposed DUMA has been shown effective and is capable of generally promoting PrLMs. Our proposed method is evaluated on two benchmark multi-choice MRC tasks, DREAM and RACE, showing that in terms of powerful PrLMs, DUMA can still boost the model to reach new state-of-the-art performance.

Related papers

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning [12.724393910603299]
We introduce the Generative Multimodal Process Reward Model (GM-PRM)<n>Instead of a simple scalar score, GM-PRM provides a fine-grained, interpretable analysis of each reasoning step.<n>We show that GM-PRM achieves state-of-the-art results on multiple multimodal math benchmarks.
arXiv Detail & Related papers (2025-08-06T05:10:29Z)
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective [42.832839189236694]
We propose MAMMQA, a multi-agent QA framework for multimodal inputs spanning text, tables, and images.<n>Our system includes two Visual Language Model (VLM) agents and one text-based Large Language Model (LLM) agent.<n> Experiments on diverse multimodal QA benchmarks demonstrate that our cooperative, multi-agent framework consistently outperforms existing baselines in both accuracy and robustness.
arXiv Detail & Related papers (2025-05-27T07:23:38Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs) We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs. We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering [53.39158264785098]
Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task. We present an entirely end-to-end solution for VideoQA: Multi-granularity Contrastive cross-modal collaborative Generation model.
arXiv Detail & Related papers (2024-10-12T06:21:58Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark [77.93283927871758]
This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning benchmark. MMMU-Pro rigorously assesses multimodal models' true understanding and reasoning capabilities.
arXiv Detail & Related papers (2024-09-04T15:31:26Z)
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models [0.0]
We formalize a planning-based approach to perform multi-step problem solving with language models. We demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches.
arXiv Detail & Related papers (2024-04-29T18:51:17Z)
Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering [27.601353412882258]
Multi-choice Machine Reading (MMRC) aims to select the correct answer from a set of options based on a given passage and question. In this paper, we reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct. Our proposed method gets rid of the multi-choice framework and can leverage resources of other tasks.
arXiv Detail & Related papers (2024-04-27T16:02:55Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question. We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z)
KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP) Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z)
Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning [18.81256990043713]
Multi-choice Machine Reading (MMRC) aims to select the correct answer from a set of options based on a given passage and question. It is non-trivial to transfer knowledge from other MRC tasks such as SQuAD, Dream. We reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct.
arXiv Detail & Related papers (2020-11-06T11:33:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.