Towards an On-device Agent for Text Rewriting
- URL: http://arxiv.org/abs/2308.11807v1
- Date: Tue, 22 Aug 2023 22:18:38 GMT
- Title: Towards an On-device Agent for Text Rewriting
- Authors: Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen,
Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng
- Abstract summary: We introduce a new instruction tuning approach for building a mobile-centric text rewriting model.
Our strategies enable the generation of high quality training data without any human labeling.
We introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions.
- Score: 22.05671256490942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities for
text rewriting. Nonetheless, the large sizes of these models make them
impractical for on-device inference, which would otherwise allow for enhanced
privacy and economical inference. Creating a smaller yet potent language model
for text rewriting presents a formidable challenge because it requires
balancing the need for a small size with the need to retain the emergent
capabilities of the LLM, that requires costly data collection. To address the
above challenge, we introduce a new instruction tuning approach for building a
mobile-centric text rewriting model. Our strategies enable the generation of
high quality training data without any human labeling. In addition, we propose
a heuristic reinforcement learning framework which substantially enhances
performance without requiring preference data. To further bridge the
performance gap with the larger server-side model, we propose an effective
approach that combines the mobile rewrite agent with the server model using a
cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce
MessageRewriteEval, a benchmark that focuses on text rewriting for messages
through natural language instructions. Through empirical experiments, we
demonstrate that our on-device model surpasses the current state-of-the-art
LLMs in text rewriting while maintaining a significantly reduced model size.
Notably, we show that our proposed cascading approach improves model
performance.
Related papers
- Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition [4.249842620609683]
This paper focuses on improving the performance of pre-trained language models in zero-shot settings through a simple and easily implementable method.
We propose a novel backward attention mechanism to enhance contextual information encoding.
arXiv Detail & Related papers (2025-02-28T05:19:18Z) - Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs [25.915607750636333]
We propose a novel approach that leverages large language models (LLMs) to extend short texts into more detailed sequences before applying topic modeling.
Our method significantly improves short-text topic modeling performance, as demonstrated by extensive experiments on real-world datasets with extreme data sparsity.
arXiv Detail & Related papers (2024-10-04T01:28:56Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models [12.393189634359064]
We present a novel ViDeoText Retrieval Paradigm with RElevance-based AugMentation, namely DREAM.
We first adopt a simple augmentation method, which generates self-similar data by randomly duplicating or dropping subwords and frames.
To further enrich video and text information, we propose a relevance-based augmentation method, where LLMs and VGMs generate and integrate new relevant information into the original data.
arXiv Detail & Related papers (2024-04-07T21:46:47Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Aligning Large Language Models with Counterfactual DPO [1.8130068086063336]
This paper explores the utilization of counterfactual prompting to align the model's style without relying on human intervention.
We demonstrate that this method effectively instils desirable behaviour, mitigates undesirable ones, and encourages the model to disregard inappropriate instructions.
arXiv Detail & Related papers (2024-01-17T19:43:43Z) - Let the Pretrained Language Models "Imagine" for Short Texts Topic
Modeling [29.87929724277381]
In short texts, co-occurrence information is minimal, which results in feature sparsity in document representation.
Existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics.
We extend short text into longer sequences using existing pre-trained language models (PLMs)
arXiv Detail & Related papers (2023-10-24T00:23:30Z) - Specializing Small Language Models towards Complex Style Transfer via
Latent Attribute Pre-Training [29.143887057933327]
We introduce the concept of complex text style transfer tasks, and constructed complex text datasets based on two widely applicable scenarios.
Our dataset is the first large-scale data set of its kind, with 700 rephrased sentences and 1,000 sentences from the game Genshin Impact.
arXiv Detail & Related papers (2023-09-19T21:01:40Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - LAFITE: Towards Language-Free Training for Text-to-Image Generation [83.2935513540494]
We propose the first work to train text-to-image generation models without any text data.
Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model.
We obtain state-of-the-art results in the standard text-to-image generation tasks.
arXiv Detail & Related papers (2021-11-27T01:54:45Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.