RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on
Agriculture
- URL: http://arxiv.org/abs/2401.08406v3
- Date: Tue, 30 Jan 2024 13:55:34 GMT
- Title: RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on
Agriculture
- Authors: Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto
de M. Estev\~ao Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick
Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp,
Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra
- Abstract summary: We propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for popular Large Language Models.
Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge.
- Score: 2.4184993026516213
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There are two common ways in which developers are incorporating proprietary
and domain-specific data when building applications of Large Language Models
(LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the
prompt with the external data, while fine-Tuning incorporates the additional
knowledge into the model itself. However, the pros and cons of both approaches
are not well understood. In this paper, we propose a pipeline for fine-tuning
and RAG, and present the tradeoffs of both for multiple popular LLMs, including
Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages,
including extracting information from PDFs, generating questions and answers,
using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We
propose metrics to assess the performance of different stages of the RAG and
fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset.
Agriculture as an industry has not seen much penetration of AI, and we study a
potentially disruptive application - what if we could provide location-specific
insights to a farmer? Our results show the effectiveness of our dataset
generation pipeline in capturing geographic-specific knowledge, and the
quantitative and qualitative benefits of RAG and fine-tuning. We see an
accuracy increase of over 6 p.p. when fine-tuning the model and this is
cumulative with RAG, which increases accuracy by 5 p.p. further. In one
particular experiment, we also demonstrate that the fine-tuned model leverages
information from across geographies to answer specific questions, increasing
answer similarity from 47% to 72%. Overall, the results point to how systems
built using LLMs can be adapted to respond and incorporate knowledge across a
dimension that is critical for a specific industry, paving the way for further
applications of LLMs in other industrial domains.
Related papers
- Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems [18.926129063000264]
RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources.
We propose a fairness evaluation framework tailored to RAG methods, using scenario-based questions and analyzing disparities across demographic attributes.
arXiv Detail & Related papers (2024-09-29T22:04:26Z) - RAGProbe: An Automated Approach for Evaluating RAG Applications [1.38012307221604]
Retrieval Augmented Generation (RAG) is increasingly being used when building Generative AI applications.
We present a technique for generating variations in question-answer pairs to trigger failures in RAG pipelines.
arXiv Detail & Related papers (2024-09-24T23:33:07Z) - What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices [91.71951459594074]
Long language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios.
Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement.
We propose the Multi-agent Interactive Multi-hop Generation framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent.
Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human
arXiv Detail & Related papers (2024-09-03T13:30:00Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation [64.7982176398485]
Retrieval-augmented generation (RAG) has demonstrated effectiveness in mitigating the hallucination problem of large language models (LLMs)
We propose DPA-RAG, a universal framework designed to align diverse knowledge preferences within RAG systems.
arXiv Detail & Related papers (2024-06-26T18:26:53Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases [9.478012553728538]
We propose an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of Large Language Models (LLMs)
Our system integrates RAG pipeline with upstream datasets processing and downstream performance evaluation.
Our experiments demonstrate the system's effectiveness in generating more accurate answers to domain-specific and time-sensitive inquiries.
arXiv Detail & Related papers (2024-03-15T16:30:14Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated.
We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.