G-SPEED: General SParse Efficient Editing MoDel
- URL: http://arxiv.org/abs/2310.10480v1
- Date: Mon, 16 Oct 2023 15:01:18 GMT
- Title: G-SPEED: General SParse Efficient Editing MoDel
- Authors: Haoke Zhang, Yue Wang, Juntao Li, Xiabing Zhou, Min Zhang
- Abstract summary: underlinetextbfGeneral underlinetextbfSParse underlinetextbfEfficient underlinetextbfEditing MounderlinetextbfDel(textbfG-SPEED)
- Score: 25.48360227520061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models~(LLMs) have demonstrated incredible capabilities in
understanding, generating, and manipulating languages. Through human-model
interactions, LLMs can automatically understand human-issued instructions and
output the expected contents, which can significantly increase working
efficiency. In various types of real-world demands, editing-oriented tasks
account for a considerable proportion, which involves an interactive process
that entails the continuous refinement of existing texts to meet specific
criteria. Due to the need for multi-round human-model interaction and the
generation of complicated editing tasks, there is an emergent need for
efficient general editing models. In this paper, we propose
\underline{\textbf{G}}eneral \underline{\textbf{SP}}arse
\underline{\textbf{E}}fficient \underline{\textbf{E}}diting
Mo\underline{\textbf{D}}el~(\textbf{G-SPEED}), which can fulfill diverse
editing requirements through a single model while maintaining low computational
costs. Specifically, we first propose a novel unsupervised text editing data
clustering algorithm to deal with the data scarcity problem. Subsequently, we
introduce a sparse editing model architecture to mitigate the inherently
limited learning capabilities of small language models. The experimental
outcomes indicate that G-SPEED, with its 508M parameters, can surpass LLMs
equipped with 175B parameters. Our code and model checkpoints are available at
\url{https://github.com/Banner-Z/G-SPEED}.
Related papers
- NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets.
NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine
Translation [38.30649186517611]
This issue introduces an textbfunderlineAuto-textbfunderlineConstriction textbfunderlineTurning mechanism for textbfunderlineMultilingual textbfunderlineNeural textbfunderlineMachine textbfunderlineTranslation (model)
arXiv Detail & Related papers (2024-03-11T14:10:57Z) - ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained
Language Models for Question Answering over Knowledge Graph [142.42275983201978]
We propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning.
We also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions.
Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data.
arXiv Detail & Related papers (2023-12-30T07:18:54Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Controlled Text Generation via Language Model Arithmetic [7.687678490751105]
We introduce model arithmetic, a novel inference framework for composing and biasing Large Language Models.
We show that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.
arXiv Detail & Related papers (2023-11-24T13:41:12Z) - Massive Editing for Large Language Models via Meta Learning [27.972194696587813]
Large language models (LLMs) have enabled learning knowledge from the pre-training corpora, but the acquired knowledge may be fundamentally incorrect or outdated over time.
We propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem.
Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B)
arXiv Detail & Related papers (2023-11-08T13:03:06Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.