Improving RAG for Personalization with Author Features and Contrastive Examples
- URL: http://arxiv.org/abs/2504.08745v1
- Date: Mon, 24 Mar 2025 01:41:22 GMT
- Title: Improving RAG for Personalization with Author Features and Contrastive Examples
- Authors: Mert Yazan, Suzan Verberne, Frederik Situmeang,
- Abstract summary: Personalization with retrieval-augmented generation (RAG) often fails to capture fine-grained features of authors.<n>We introduce Contrastive Examples: documents from other authors are retrieved to help LLM identify what makes an author's style unique in comparison to others.<n>Our results show the value of fine-grained features for better personalization, while opening a new research dimension for including contrastive examples as a complement with RAG.
- Score: 2.6968321526169503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personalization with retrieval-augmented generation (RAG) often fails to capture fine-grained features of authors, making it hard to identify their unique traits. To enrich the RAG context, we propose providing Large Language Models (LLMs) with author-specific features, such as average sentiment polarity and frequently used words, in addition to past samples from the author's profile. We introduce a new feature called Contrastive Examples: documents from other authors are retrieved to help LLM identify what makes an author's style unique in comparison to others. Our experiments show that adding a couple of sentences about the named entities, dependency patterns, and words a person uses frequently significantly improves personalized text generation. Combining features with contrastive examples boosts the performance further, achieving a relative 15% improvement over baseline RAG while outperforming the benchmarks. Our results show the value of fine-grained features for better personalization, while opening a new research dimension for including contrastive examples as a complement with RAG. We release our code publicly.
Related papers
- LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID.<n>It adopts prompt-tuning strategies to leverage attribute-based text knowledge.<n>Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z) - Personalized Graph-Based Retrieval for Large Language Models [51.7278897841697]
We propose a framework that leverages user-centric knowledge graphs to enrich personalization.<n>By directly integrating structured user knowledge into the retrieval process and augmenting prompts with user-relevant context, PGraph enhances contextual understanding and output quality.<n>We also introduce the Personalized Graph-based Benchmark for Text Generation, designed to evaluate personalized text generation tasks in real-world settings where user history is sparse or unavailable.
arXiv Detail & Related papers (2025-01-04T01:46:49Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models [31.443478448031886]
RoSE (Relation-oriented Semantic Edge-decomposition) is a novel framework that decomposes the graph structure by analyzing raw text attributes.
Our framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.
arXiv Detail & Related papers (2024-05-28T20:54:47Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Unveiling the Multi-Annotation Process: Examining the Influence of
Annotation Quantity and Instance Difficulty on Model Performance [1.7343894615131372]
We show how performance scores can vary when a dataset expands from a single annotation per instance to multiple annotations.
We propose a novel multi-annotator simulation process to generate datasets with varying annotation budgets.
arXiv Detail & Related papers (2023-10-23T05:12:41Z) - ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - IGA : An Intent-Guided Authoring Assistant [37.98368621931934]
We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to author specifications.
Users provide input to our Intent-Guided Assistant (IGA) in the form of text interspersed with tags that correspond to specific rhetorical directives.
We fine-tune a language model on a datasetally-labeled with author intent, which allows IGA to fill in these tags with generated text that users can subsequently edit to their liking.
arXiv Detail & Related papers (2021-04-14T17:32:21Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.