REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models
- URL: http://arxiv.org/abs/2410.12890v1
- Date: Wed, 16 Oct 2024 08:43:39 GMT
- Title: REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models
- Authors: Ambuje Gupta, Mrinal Rawat, Andreas Stolcke, Roberto Pieraccini,
- Abstract summary: Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA)
We propose REFINE, a novel technique that generates synthetic data from available documents and then uses a model fusion approach to fine-tune embeddings.
- Score: 14.023953508288628
- License:
- Abstract: Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA), relying on retrieving relevant documents from a vector store computed using a pretrained embedding model. However, if the retrieved context is inaccurate, the answers generated using the large language model (LLM) may contain errors or hallucinations. Although pretrained embedding models have advanced, adapting them to new domains remains challenging. Fine-tuning is a potential solution, but industry settings often lack the necessary fine-tuning data. To address these challenges, we propose REFINE, a novel technique that generates synthetic data from available documents and then uses a model fusion approach to fine-tune embeddings for improved retrieval performance in new domains, while preserving out-of-domain capability. We conducted experiments on the two public datasets: SQUAD and RAG-12000 and a proprietary TOURISM dataset. Results demonstrate that even the standard fine-tuning with the proposed data augmentation technique outperforms the vanilla pretrained model. Furthermore, when combined with model fusion, the proposed approach achieves superior performance, with a 5.76% improvement in recall on the TOURISM dataset, and 6.58 % and 0.32% enhancement on SQUAD and RAG-12000 respectively.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation [13.120801609024147]
retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs.
RAG inputs are more complex than most datasets used for training NLI models.
We introduce Automatic Generative Domain Adaptation (Auto-GDA) to enable unsupervised domain adaptation.
arXiv Detail & Related papers (2024-10-04T14:21:27Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Crafting Efficient Fine-Tuning Strategies for Large Language Models [2.633490094119608]
Fine-tuning large language models (LLMs) with as few as 200 samples can improve model accuracy from 70% to 88% in a product attribute extraction task.
A bayesian hyperparameter optimization method, which evaluates models at 20% of total training time, correlates strongly with final model performance.
This approach led to a 2% improvement in accuracy over baseline models when evaluated on an independent test set.
arXiv Detail & Related papers (2024-07-18T21:36:00Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z) - Comparison of Transfer Learning based Additive Manufacturing Models via
A Case Study [3.759936323189418]
This paper defines a case study based on an open-source dataset about metal AM products.
Five TL methods are integrated with decision tree regression (DTR) and artificial neural network (ANN) to construct six TL-based models.
The comparisons are used to quantify the performance of applied TL methods and are discussed from the perspective of similarity, training data size, and data preprocessing.
arXiv Detail & Related papers (2023-05-17T00:29:25Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Exposing Shallow Heuristics of Relation Extraction Models with Challenge
Data [49.378860065474875]
We identify failure modes of SOTA relation extraction (RE) models trained on TACRED.
By adding some of the challenge data as training examples, the performance of the model improves.
arXiv Detail & Related papers (2020-10-07T21:17:25Z) - Beat the AI: Investigating Adversarial Human Annotation for Reading
Comprehension [27.538957000237176]
Humans create questions adversarially, such that the model fails to answer them correctly.
We collect 36,000 samples with progressively stronger models in the annotation loop.
We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets.
We find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
arXiv Detail & Related papers (2020-02-02T00:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.