Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning
- URL: http://arxiv.org/abs/2410.12164v1
- Date: Wed, 16 Oct 2024 02:04:17 GMT
- Title: Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning
- Authors: Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri,
- Abstract summary: We propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks.
- Score: 52.08794743921141
- License:
- Abstract: In this work, we propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks. Our insight is that for each table task, there often exist two dual versions of the same task, one generative and one classification in nature. Leveraging their duality, we propose a Generator-Validator paradigm, to iteratively generate-then-validate training data from language-models, to fine-tune stronger \sys models that can specialize in a given task, without requiring manually-labeled data. Our extensive evaluations suggest that our Table-Specialist has (1) \textit{strong performance} on diverse table tasks over vanilla language-models -- for example, Table-Specialist fine-tuned on GPT-3.5 not only outperforms vanilla GPT-3.5, but can often match or surpass GPT-4 level quality, (2) \textit{lower cost} to deploy, because when Table-Specialist fine-tuned on GPT-3.5 achieve GPT-4 level quality, it becomes possible to deploy smaller models with lower latency and inference cost, with comparable quality, and (3) \textit{better generalizability} when evaluated across multiple benchmarks, since \sys is fine-tuned on a broad range of training data systematically generated from diverse real tables. Our code and data will be available at https://github.com/microsoft/Table-LLM-Specialist.
Related papers
- TableGPT2: A Large Multimodal Model with Tabular Data Integration [22.77225649639725]
TableGPT2 is a model rigorously pre-trained and fine-tuned with over 593.8K tables and 2.36M high-quality query-table-outputs.
One of TableGPT2's key innovations is its novel table encoder, specifically designed to capture schema-level and cell-level information.
arXiv Detail & Related papers (2024-11-04T13:03:13Z) - UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining [22.031699293366486]
We present UniTable, a training framework that unifies the training paradigm and training objective of table recognition.
Our framework unifies the training objectives of all three TR tasks into a unified task-agnostic training objective: language modeling.
UniTable's table parsing capability has surpassed both existing TR methods and general large vision-language models.
arXiv Detail & Related papers (2024-03-07T15:44:50Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - TableLlama: Towards Open Large Generalist Models for Tables [22.56558262472516]
This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks.
We construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.
We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge.
arXiv Detail & Related papers (2023-11-15T18:47:52Z) - Table-GPT: Table-tuned GPT for Diverse Table Tasks [32.90285815448813]
We train language models like GPT-3.5 and ChatGPT using diverse table-tasks synthesized from real tables as training data.
We show that our resulting Table-GPT models demonstrate better emphtable-understanding capabilities, by consistently outperforming the vanilla GPT-3.5 and ChatGPT.
arXiv Detail & Related papers (2023-10-13T17:20:56Z) - TableGPT: Towards Unifying Tables, Nature Language and Commands into One
GPT [19.57099486334867]
TableGPT is a framework that enables large language models (LLMs) to understand and operate on tables using external functional commands.
TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data.
arXiv Detail & Related papers (2023-07-17T17:36:09Z) - OmniTab: Pretraining with Natural and Synthetic Data for Few-shot
Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort.
We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.