Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction
- URL: http://arxiv.org/abs/2308.04215v3
- Date: Sat, 12 Oct 2024 12:50:33 GMT
- Title: Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction
- Authors: Menglin Xia, Xuchao Zhang, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle,
- Abstract summary: We propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA) for real-time text prediction.
It efficiently combines a cloud-based large language model with a smaller client-side model through retrieval augmented memory.
Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.
- Score: 17.94189417448127
- License:
- Abstract: Large language models (LLMs) enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM's capabilities and cloud-based data. Meanwhile, via a novel asynchronous memory update mechanism, the client model can deliver real-time completions to user inputs without the need to wait for responses from the cloud. Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Hybrid Training Approaches for LLMs: Leveraging Real and Synthetic Data to Enhance Model Performance in Domain-Specific Applications [0.0]
This research explores a hybrid approach to fine-tuning large language models (LLMs)
By leveraging a dataset combining transcribed real interactions with high-quality synthetic sessions, we aimed to overcome the limitations of domain-specific real data.
The study evaluated three models: a base foundational model, a model fine-tuned with real data, and a hybrid fine-tuned model.
arXiv Detail & Related papers (2024-10-11T18:16:03Z) - Self-Boosting Large Language Models with Synthetic Preference Data [97.94185115047999]
We introduce SynPO, a self-boosting paradigm that leverages synthetic preference data for model alignment.
After four SynPO iterations, Llama3-8B and Mistral-7B show significant enhancements in instruction-following abilities.
SynPO improves the general performance of LLMs on various tasks, validated by a 3.2 to 5.0 average score increase on the well-recognized Open LLM leaderboard.
arXiv Detail & Related papers (2024-10-09T14:57:31Z) - Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models [66.24055500785657]
Traditional turn-based chat systems prevent users from verbally interacting with system while it is generating responses.
To overcome these limitations, we adapt existing LLMs to listen users while generating output and provide users with instant feedback.
We build a dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions.
arXiv Detail & Related papers (2024-06-22T03:20:10Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - Towards Client Driven Federated Learning [7.528642177161784]
We introduce Client-Driven Federated Learning (CDFL), a novel FL framework that puts clients at the driving role.
In CDFL, each client independently and asynchronously updates its model by uploading the locally trained model to the server and receiving a customized model tailored to its local task.
arXiv Detail & Related papers (2024-05-24T10:17:49Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - Efficient Cloud-edge Collaborative Inference for Object
Re-identification [27.952445808987036]
We pioneer a cloud-edge collaborative inference framework for ReID systems.
We propose a distribution-aware correlation modeling network (DaCM) to make the desired image return to the cloud server.
DaCM embeds the spatial-temporal correlations implicitly included in the timestamps into a graph structure, and it can be applied in the cloud to regulate the size of the upload window.
arXiv Detail & Related papers (2024-01-04T02:56:50Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in
Public Cloud [9.149566952446058]
We proposeCocktail, a costeffective ensembling-based model serving framework.
A prototype implementation ofCocktailon the AWS EC2 platform and exhaustive evalua-tions using a variety of workloads demonstrate thatCocktailcan reduce deployment cost by 1.45x.
arXiv Detail & Related papers (2021-06-09T19:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.