MMTM: Multi-Tasking Multi-Decoder Transformer for Math Word Problems
- URL: http://arxiv.org/abs/2206.01268v1
- Date: Thu, 2 Jun 2022 19:48:36 GMT
- Title: MMTM: Multi-Tasking Multi-Decoder Transformer for Math Word Problems
- Authors: Keyur Faldu, Amit Sheth, Prashant Kikani, Darshan Patel
- Abstract summary: We present a novel model MMTM that leverages multi-tasking and multi-decoder during pre-training.
MMTM model achieves better mathematical reasoning ability and generalisability.
We demonstrate by outperforming the best state of the art baseline models from Seq2Seq, GTS, and Graph2Tree with a relative improvement of 19.4% on an adversarial challenge dataset SVAMP.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, quite a few novel neural architectures were derived to solve math
word problems by predicting expression trees. These architectures varied from
seq2seq models, including encoders leveraging graph relationships combined with
tree decoders. These models achieve good performance on various MWPs datasets
but perform poorly when applied to an adversarial challenge dataset, SVAMP. We
present a novel model MMTM that leverages multi-tasking and multi-decoder
during pre-training. It creates variant tasks by deriving labels using
pre-order, in-order and post-order traversal of expression trees, and uses
task-specific decoders in a multi-tasking framework. We leverage transformer
architectures with lower dimensionality and initialize weights from RoBERTa
model. MMTM model achieves better mathematical reasoning ability and
generalisability, which we demonstrate by outperforming the best state of the
art baseline models from Seq2Seq, GTS, and Graph2Tree with a relative
improvement of 19.4% on an adversarial challenge dataset SVAMP.
Related papers
- Mixed-Query Transformer: A Unified Image Segmentation Architecture [57.32212654642384]
Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task.
We introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single set of weights.
arXiv Detail & Related papers (2024-04-06T01:54:17Z) - Pre-Trained Model Recommendation for Downstream Fine-tuning [22.343011779348682]
Model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task.
Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks.
We present a pragmatic framework textbfFennec, delving into a diverse, large-scale model repository.
arXiv Detail & Related papers (2024-03-11T02:24:32Z) - Testing the Limits of Unified Sequence to Sequence LLM Pretraining on
Diverse Table Data Tasks [2.690048852269647]
We study the advantages of a unified approach to table specific pretraining when scaled from 770M to 11B sequence to sequence models.
Our work is the first attempt at studying the advantages of a unified approach to table specific pretraining when scaled from 770M to 11B sequence to sequence models.
arXiv Detail & Related papers (2023-10-01T21:06:15Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via
Sequence Modeling [3.867363075280544]
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data.
New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM)
Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
arXiv Detail & Related papers (2023-01-06T10:08:11Z) - Multi-Agent Reinforcement Learning is a Sequence Modeling Problem [33.679936867612525]
We introduce a novel architecture named Multi-Agent Transformer (MAT)
MAT casts cooperative multi-agent reinforcement learning (MARL) into SM problems.
Central to MAT is an encoder-decoder architecture which transforms the joint policy search problem into a sequential decision making process.
arXiv Detail & Related papers (2022-05-30T09:39:45Z) - Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries.
Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables.
To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z) - Data Augmentation for Abstractive Query-Focused Multi-Document
Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods.
These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries.
We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.