Related papers: Weaver: Foundation Models for Creative Writing

Weaver: Foundation Models for Creative Writing

URL: http://arxiv.org/abs/2401.17268v1
Date: Tue, 30 Jan 2024 18:58:43 GMT
Title: Weaver: Foundation Models for Creative Writing
Authors: Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang, Yiru Wang, Siran Ding, Jiayang Huang, Jiayi Xu, Yilihamu Tayier, Zhenyu Hu, Yuan Gao, Chengfeng Zheng, Yueshu Ye, Yihang Li, Lei Wan, Xinyue Jiang, Yujie Wang, Siyu Cheng, Zhule Song, Xiangru Tang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang, and Wangchunshu Zhou
Abstract summary: We introduce Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers.
Score: 61.26716770063019
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs.

Related papers

Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Large Language Models as Narrative-Driven Recommenders [0.051205673783866146]
Large language models (LLMs) have been shown to excel in processing general natural language queries. We compare the performance of 38 open- and closed-source LLMs of various sizes in a movie recommendation setting. Our findings demonstrate the ability of LLMs to generate contextually relevant movie recommendations.
arXiv Detail & Related papers (2024-10-17T14:39:24Z)
Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant [28.752596543740225]
We present a new design and evaluation for such an automated assistant, which we call Panza. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation. We demonstrate that this combination allows us to fine-tune an LLM to reflect a user's writing style using limited data, while executing on extremely limited resources.
arXiv Detail & Related papers (2024-06-24T12:09:34Z)
Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification [0.0]
Generative AI offers a simple, prompt-based alternative to fine-tuning smaller BERT-style LLMs for text classification tasks. We show that smaller, fine-tuned LLMs consistently and significantly outperform larger, zero-shot prompted models in text classification.
arXiv Detail & Related papers (2024-06-12T21:46:13Z)
LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning [2.611731148829789]
We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. Our model outperforms the baseline on zero shot learning and offers improved multilingual support. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications.
arXiv Detail & Related papers (2024-05-18T17:28:29Z)
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models [8.920436030483872]
We propose Writing Path, a framework that uses explicit outlines to guide Large Language Models (LLMs) in generating user-aligned text. Our approach draws inspiration from structured writing planning and reasoning paths, focusing on capturing and reflecting user intentions throughout the writing process.
arXiv Detail & Related papers (2024-04-22T06:57:43Z)
Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z)
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs) We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score) Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z)
Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning [5.822010906632045]
This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis.
arXiv Detail & Related papers (2023-07-05T10:15:07Z)
RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit. Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets. Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.