Related papers: StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

URL: http://arxiv.org/abs/2402.16671v7
Date: Mon, 07 Oct 2024 14:44:44 GMT
Title: StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Authors: Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen,
Abstract summary: Large language models' (LLMs) ability to process structured data lags behind state-of-the-art (SoTA) model by an average of 35%. We train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks.
Score: 49.10029030628653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, StructLM demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.

Related papers

SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation [50.277959544420455]
SAFT is a structure-aware fine-tuning approach that injects graph topology into pretrained language models.<n>We compute direction-sensitive positional encodings from the magnetic Laplacian of transformed AMRs.<n> SAFT sets a new state-of-the-art on AMR 3.0 with a 3.5 BLEU improvement over baselines.
arXiv Detail & Related papers (2025-07-15T18:12:57Z)
Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information. This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z)
The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats [0.0]
This study systematically evaluating Large Language Models' ability to convert unstructured text into structured formats. Experiments reveal that GPT-4o with few-shot prompting achieves breakthrough performance. These findings open new possibilities for automated structured data generation across various domains.
arXiv Detail & Related papers (2025-03-04T14:14:28Z)
HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning [25.088407009353162]
Existing benchmarks for structure reasoning mainly focus on horizontal and coordinate structures. HiBench is the first framework spanning from initial structure generation to final proficiency assessment. It consists of 30 tasks with varying hierarchical complexity, totaling 39,519 queries.
arXiv Detail & Related papers (2025-03-02T14:25:37Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales. We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud [12.651588927599441]
We present a family of data augmentation models designed to significantly improve the efficiency for model fine-tuning. These models, trained based on sufficiently small LLMs, support key functionalities with low inference costs. Experiments and an application study prove the effectiveness of our approach.
arXiv Detail & Related papers (2024-12-06T09:04:12Z)
Struct-X: Enhancing Large Language Models Reasoning with Structured Data [38.558614152006975]
Struct-X operates through five key phases: read-model-fill-reflect-reason'' It encodes structured data into a topological space using graph embeddings. It fills in missing entity information with knowledge retrieval modules. The final phase involves constructing a topological network with selected tokens.
arXiv Detail & Related papers (2024-07-17T13:06:25Z)
Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data [39.29778853025738]
Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data.
arXiv Detail & Related papers (2024-07-03T01:51:50Z)
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z)
LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English. When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z)
Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building [6.445605125467575]
We train language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data. Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture.
arXiv Detail & Related papers (2023-10-31T16:26:36Z)
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs) We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score) Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z)
LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large Language Models [0.8702432681310401]
Our paper proposes LLM2KB, a system for constructing knowledge bases using large language models. Our best performing model achieved an average F1 score of 0.6185 across 21 relations in the LM-KBC challenge held at the ISWC 2023 conference.
arXiv Detail & Related papers (2023-08-25T07:04:16Z)
StructGPT: A General Framework for Large Language Model to Reason over Structured Data [117.13986738340027]
We develop an emphIterative Reading-then-Reasoning(IRR) approach for solving question answering tasks based on structured data. Our approach can significantly boost the performance of ChatGPT and achieve comparable performance against the full-data supervised-tuning baselines.
arXiv Detail & Related papers (2023-05-16T17:45:23Z)
DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.