Generative Representational Instruction Tuning
- URL: http://arxiv.org/abs/2402.09906v2
- Date: Wed, 17 Apr 2024 17:12:05 GMT
- Title: Generative Representational Instruction Tuning
- Authors: Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela,
- Abstract summary: GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB)
GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models.
- Score: 89.76840377003178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.
Related papers
- Multi-Modal Generative Embedding Model [34.34876575183736]
We propose a Multi-Modal Generative Embedding Model (MM-GEM), whereby the generative and embedding objectives are encapsulated in one Large Language Model.
For example, MM-GEM instantiated from ViT-Large and TinyLlama shows competitive performance on benchmarks for multimodal embedding models.
The advanced text model in MM-GEM brings over 5% improvement in Recall@1 for long text and image retrieval.
arXiv Detail & Related papers (2024-05-29T17:59:10Z) - Evaluating Generative Models for Graph-to-Text Generation [0.0]
We explore the capability of generative models to generate descriptive text from graph data in a zero-shot setting.
Our results demonstrate that generative models are capable of generating fluent and coherent text.
However, our error analysis reveals that generative models still struggle with understanding the semantic relations between entities.
arXiv Detail & Related papers (2023-07-27T09:03:05Z) - Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.