Related papers: Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

URL: http://arxiv.org/abs/2407.03051v2
Date: Thu, 18 Jul 2024 06:21:23 GMT
Title: Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Authors: Janghwan Lee, Seongmin Park, Sukjin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi,
Abstract summary: quantization-aware direct preference optimization (QDPO) improves conversational abilities of quantized large language models (LLMs) evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques.
Score: 8.91053640932991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.

Related papers

Probing Large Language Models in Reasoning and Translating Complex Linguistic Puzzles [0.6144680854063939]
This paper investigates the utilization of Large Language Models (LLMs) for solving complex linguistic puzzles. Using datasets from the Puzzling Machine Competition and various Linguistics Olympiads, we employ a comprehensive set of metrics to assess the performance of GPT-4 0603.
arXiv Detail & Related papers (2025-02-02T14:53:14Z)
Lla-VAP: LSTM Ensemble of Llama and VAP for Turn-Taking Prediction [0.0]
This project expands on existing strategies for turn-taking prediction by employing a multi-modal ensemble approach. We aim to improve the accuracy and efficiency of identifying TRPs in both scripted and unscripted conversational scenarios.
arXiv Detail & Related papers (2024-12-24T00:20:38Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers [7.7705926659081275]
VerifierQ is a novel approach that integrates Offline Q-learning into verifier models. We address three key challenges in applying Q-learning to LLMs. Our method enables parallel Q-value computation and improving training efficiency.
arXiv Detail & Related papers (2024-10-10T15:43:55Z)
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) We present a simple yet effective automatic process for creating speech-text pair data. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z)
Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training allows for sample-efficient dialogue policy learning in multi-turn conversation. ACT demonstrates substantial conversation modeling improvements over standard approaches to supervised fine-tuning and DPO.
arXiv Detail & Related papers (2024-05-31T22:44:48Z)
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation [18.329192763760034]
We introduce BLSP-KD, a novel approach for Bootstrapping Language-Speech Pretraining via Knowledge Distillation. It optimize speech-text alignment by minimizing the divergence between the LLM's next-token prediction distributions for speech and text inputs. It also employs a continuous-integrate-andfire strategy to segment speech into tokens that correspond one-to-one with text tokens, enabling fine-grained alignment.
arXiv Detail & Related papers (2024-05-29T12:32:08Z)
Boosting Large Language Model for Speech Synthesis: An Empirical Study [86.89548753080432]
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. We conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder
arXiv Detail & Related papers (2023-12-30T14:20:04Z)
Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights. This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z)
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting [32.70214938434769]
We explore the ability of large language models (LLMs) to act as speech recognition post-processors. We evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method. We show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs.
arXiv Detail & Related papers (2023-09-27T13:36:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.