Testing the Limits of Unified Sequence to Sequence LLM Pretraining on
Diverse Table Data Tasks
- URL: http://arxiv.org/abs/2310.00789v1
- Date: Sun, 1 Oct 2023 21:06:15 GMT
- Title: Testing the Limits of Unified Sequence to Sequence LLM Pretraining on
Diverse Table Data Tasks
- Authors: Soumajyoti Sarkar, Leonard Lausen
- Abstract summary: We study the advantages of a unified approach to table specific pretraining when scaled from 770M to 11B sequence to sequence models.
Our work is the first attempt at studying the advantages of a unified approach to table specific pretraining when scaled from 770M to 11B sequence to sequence models.
- Score: 2.690048852269647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tables stored in databases and tables which are present in web pages and
articles account for a large part of semi-structured data that is available on
the internet. It then becomes pertinent to develop a modeling approach with
large language models (LLMs) that can be used to solve diverse table tasks such
as semantic parsing, question answering as well as classification problems.
Traditionally, there existed separate models specialized for each task
individually. It raises the question of how far can we go to build a unified
model that works well on some table tasks without significant degradation on
others. To that end, we attempt at creating a shared modeling approach in the
pretraining stage with encoder-decoder style LLMs that can cater to diverse
tasks. We evaluate our approach that continually pretrains and finetunes
different model families of T5 with data from tables and surrounding context,
on these downstream tasks at different model scales. Through multiple ablation
studies, we observe that our pretraining with self-supervised objectives can
significantly boost the performance of the models on these tasks. As an example
of one improvement, we observe that the instruction finetuned public models
which come specialized on text question answering (QA) and have been trained on
table data still have room for improvement when it comes to table specific QA.
Our work is the first attempt at studying the advantages of a unified approach
to table specific pretraining when scaled from 770M to 11B sequence to sequence
models while also comparing the instruction finetuned variants of the models.
Related papers
- TableLlama: Towards Open Large Generalist Models for Tables [22.56558262472516]
This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks.
We construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.
We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge.
arXiv Detail & Related papers (2023-11-15T18:47:52Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - UniMASK: Unified Inference in Sequential Decision Problems [17.09745648221254]
We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks.
A single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models.
arXiv Detail & Related papers (2022-11-20T04:54:49Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Making Table Understanding Work in Practice [9.352813774921655]
We discuss three challenges of deploying table understanding models and propose a framework to address them.
We present SigmaTyper which encapsulates a hybrid model trained on GitTables and integrates a lightweight human-in-the-loop approach to customize the model.
arXiv Detail & Related papers (2021-09-11T03:38:24Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z) - TURL: Table Understanding through Representation Learning [29.6016859927782]
TURL is a novel framework that introduces the pre-training/finetuning paradigm to relational Web tables.
During pre-training, our framework learns deep contextualized representations on relational tables in an unsupervised manner.
We show that TURL generalizes well to all tasks and substantially outperforms existing methods in almost all instances.
arXiv Detail & Related papers (2020-06-26T05:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.