Parameter-Efficient Abstractive Question Answering over Tables or Text
- URL: http://arxiv.org/abs/2204.03357v1
- Date: Thu, 7 Apr 2022 10:56:29 GMT
- Title: Parameter-Efficient Abstractive Question Answering over Tables or Text
- Authors: Vaishali Pal, Evangelos Kanoulas, Maarten de Rijke
- Abstract summary: A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries.
Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables.
To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
- Score: 60.86457030988444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A long-term ambition of information seeking QA systems is to reason over
multi-modal contexts and generate natural answers to user queries. Today,
memory intensive pre-trained language models are adapted to downstream tasks
such as QA by fine-tuning the model on QA data in a specific modality like
unstructured text or structured tables. To avoid training such memory-hungry
models while utilizing a uniform architecture for each modality,
parameter-efficient adapters add and train small task-specific bottle-neck
layers between transformer layers. In this work, we study parameter-efficient
abstractive QA in encoder-decoder models over structured tabular data and
unstructured textual data using only 1.5% additional parameters for each
modality. We also ablate over adapter layers in both encoder and decoder
modules to study the efficiency-performance trade-off and demonstrate that
reducing additional trainable parameters down to 0.7%-1.0% leads to comparable
results. Our models out-perform current state-of-the-art models on tabular QA
datasets such as Tablesum and FeTaQA, and achieve comparable performance on a
textual QA dataset such as NarrativeQA using significantly less trainable
parameters than fine-tuning.
Related papers
- Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures [8.442206285783463]
Transformer-based language models have recently been at the forefront of active research in text generation.
These models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades.
We investigate transformer-based architectures for improving model performance in a low-data regime by selectively replacing attention layers with feed-forward and quasi-recurrent neural network layers.
arXiv Detail & Related papers (2025-02-02T01:05:09Z) - TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data [9.390415313514762]
TARGA is a framework that generates high-relevance synthetic data without manual annotation.
It substantially outperforms existing non-fine-tuned methods that utilize close-sourced model.
It exhibits superior sample efficiency, robustness, and generalization capabilities under non-I.I.D. settings.
arXiv Detail & Related papers (2024-12-27T09:16:39Z) - Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning [12.648711621637663]
This paper introduces a novel.
COCO-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models.
We propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain.
Our method is evaluated on the captioning task, where it outperforms full fine-tuning under similar data constraints.
arXiv Detail & Related papers (2023-12-14T13:00:24Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Capturing Row and Column Semantics in Transformer Based Question
Answering over Tables [9.347393642549806]
We show that one can achieve superior performance on table QA task without using any of these specialized pre-training techniques.
Experiments on recent benchmarks prove that the proposed methods can effectively locate cell values on tables (up to 98% Hit@1 accuracy on Wiki lookup questions)
arXiv Detail & Related papers (2021-04-16T18:22:30Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.