Related papers: Efficient In-Domain Question Answering for Resource-Constrained Environments

Efficient In-Domain Question Answering for Resource-Constrained Environments

URL: http://arxiv.org/abs/2409.17648v3
Date: Thu, 17 Oct 2024 14:41:39 GMT
Title: Efficient In-Domain Question Answering for Resource-Constrained Environments
Authors: Isaac Chung, Phat Vo, Arman C. Kizilkale, Aaron Reite,
Abstract summary: Retrieval Augmented Generation (RAG) is a method for integrating external knowledge into pretrained Large Language Models (LLMs) Recent studies have shown success in using fine tuning to address these problems. In this work, we combine RAFT with LoRA to reduce fine tuning and storage requirements and gain faster inference times.
Score: 0.07499722271664146
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Retrieval Augmented Generation (RAG) is a common method for integrating external knowledge into pretrained Large Language Models (LLMs) to enhance accuracy and relevancy in question answering (QA) tasks. However, prompt engineering and resource efficiency remain significant bottlenecks in developing optimal and robust RAG solutions for real-world QA applications. Recent studies have shown success in using fine tuning to address these problems; in particular, Retrieval Augmented Fine Tuning (RAFT) applied to smaller 7B models has demonstrated superior performance compared to RAG setups with much larger models such as GPT-3.5. The combination of RAFT with parameter-efficient fine tuning (PEFT) techniques, such as Low-Rank Adaptation (LoRA), promises an even more efficient solution, yet remains an unexplored area. In this work, we combine RAFT with LoRA to reduce fine tuning and storage requirements and gain faster inference times while maintaining comparable RAG performance. This results in a more compute-efficient RAFT, or CRAFT, which is particularly useful for knowledge-intensive QA tasks in resource-constrained environments where internet access may be restricted and hardware resources limited.

Related papers

CoLA: Collaborative Low-Rank Adaptation [3.421904493396495]
Fine-tuning a pre-trained model for specific tasks achieves strong performance; however, it is computationally expensive and inefficient.<n>LoRA, in particular, has proven effective, but its application to multi-task scenarios is limited by interference between tasks.<n>We propose CoLA, a more flexible LoRA architecture and three collaborative strategies to enhance performance by better utilizing the quantitative relationships between $A$ and $B$.
arXiv Detail & Related papers (2025-05-21T12:46:42Z)
Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning [45.10424242207931]
Retrieval-augmented generation (RAG) enhances the text generation capabilities of large language models (LLMs)<n>We introduce a novel method ReasonRAG that automatically constructs RAG-ProGuide, a high-quality dataset providing process-level rewards for query generation, evidence extraction, and answer generation.<n>With the process-level policy optimization, the proposed framework empowers LLMs to autonomously invoke search, generate queries, extract relevant evidence, and produce final answers.
arXiv Detail & Related papers (2025-05-20T08:21:00Z)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z)
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models [9.090588805667263]
We propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency.
arXiv Detail & Related papers (2025-02-25T13:00:05Z)
RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization [53.63439735067081]
Large language models (LLMs) have achieved impressive performance but face high computational costs and latency. Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs. We propose RoseRAG, a robust RAG framework for SLMs via Margin-aware Preference Optimization.
arXiv Detail & Related papers (2025-02-16T04:56:53Z)
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks [11.053340674721005]
Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources. This paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses real-time retrieval.
arXiv Detail & Related papers (2024-12-20T06:58:32Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
GeoLoRA: Geometric integration for parameter efficient fine-tuning [6.701651480567394]
Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of pre-trained neural networks. We introduce GeoLoRA, a novel approach that addresses the limitations by leveraging dynamical low-rank approximation theory. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-24T13:26:10Z)
AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning [0.0]
We propose AT-RAG, a novel multistep RAG incorporating topic modeling for efficient document retrieval and reasoning. Using BERTopic, our model dynamically assigns topics to queries, improving retrieval accuracy and efficiency. Results show significant improvements in correctness, completeness, and relevance compared to existing methods.
arXiv Detail & Related papers (2024-10-16T01:57:56Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation [32.26885597587913]
We introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG) In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the Large Language Models (LLMs) A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts.
arXiv Detail & Related papers (2024-09-24T03:25:36Z)
SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance. We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination. We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z)
EASRec: Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems [82.76483989905961]
Current Sequential Recommender Systems (SRSs) suffer from computational and resource inefficiencies. We develop the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec) EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network.
arXiv Detail & Related papers (2024-02-01T07:22:52Z)
Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR) CFSR inherits the advantages of both convolution-based and transformer-based approaches. Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z)
A Distributed Deep Reinforcement Learning Technique for Application Placement in Edge and Fog Computing Environments [31.326505188936746]
Several Deep Reinforcement Learning (DRL)-based placement techniques have been proposed in fog/edge computing environments. We propose an actor-critic-based distributed application placement technique, working based on the IMPortance weighted Actor-Learner Architectures (IMPALA)
arXiv Detail & Related papers (2021-10-24T11:25:03Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.