MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering
- URL: http://arxiv.org/abs/2503.06296v1
- Date: Sat, 08 Mar 2025 18:09:13 GMT
- Title: MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering
- Authors: Vinay Kumar Verma, Shreyas Sunil Kulkarni, Happy Mittal, Deepak Gupta,
- Abstract summary: Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain.<n>One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources.<n>We formulate a novel question-answer generation framework in an environment containing multi-source, multimodal information.
- Score: 12.587358542578126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain. One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources. This scenario contains richer information but is highly complex to handle. In this work, we formulate a novel question-answer generation (QAG) framework in an environment containing multi-source, multimodal information. The answer may belong to any or all sources; therefore, selecting the most prominent answer source or an optimal combination of all sources for a given question is challenging. To address this issue, we propose a question-guided attention mechanism that learns attention across multiple sources and decodes this information for robust and unbiased answer generation. To learn attention within each source, we introduce an explicit alignment between questions and various information sources, which facilitates identifying the most pertinent parts of the source information relative to the question. Scalability in handling diverse questions poses a challenge. We address this by extending our model to a sparse mixture-of-experts (sparse-MoE) framework, enabling it to handle thousands of question types. Experiments on T5 and Flan-T5 using three datasets demonstrate the model's efficacy, supported by ablation studies.
Related papers
- Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Aspect-oriented Consumer Health Answer Summarization [2.298110639419913]
Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs.
There can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern.
Our research focuses on aspect-based summarization of health answers to address this limitation.
arXiv Detail & Related papers (2024-05-10T07:52:43Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-ofDiscussion framework to leverage the synergy among open-source Large Language Models.<n>Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Answering Subjective Induction Questions on Products by Summarizing
Multi-sources Multi-viewpoints Knowledge [0.04791377777154766]
This paper proposes a new task in the field of Answering Subjective Induction Question on Products.
The answer to this kind of question is non-unique, but can be interpreted from many perspectives.
A satisfied answer should be able to summarize these subjective opinions from multiple sources and provide objective knowledge.
arXiv Detail & Related papers (2023-09-12T03:27:08Z) - HeteroQA: Learning towards Question-and-Answering through Multiple
Information Sources via Heterogeneous Graph Modeling [50.39787601462344]
Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests.
Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question.
We propose a question-aware heterogeneous graph transformer to incorporate the multiple information sources (MIS) in the user community to automatically generate the answer.
arXiv Detail & Related papers (2021-12-27T10:16:43Z) - Question Answering Survey: Directions, Challenges, Datasets, Evaluation
Matrices [0.0]
The research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach.
This detailed followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language.
arXiv Detail & Related papers (2021-12-07T08:53:40Z) - MultiModalQA: Complex Question Answering over Text, Tables and Images [52.25399438133274]
We present MultiModalQA: a dataset that requires joint reasoning over text, tables and images.
We create MMQA using a new framework for generating complex multi-modal questions at scale.
We then define a formal language that allows us to take questions that can be answered from a single modality, and combine them to generate cross-modal questions.
arXiv Detail & Related papers (2021-04-13T09:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.