MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference
- URL: http://arxiv.org/abs/2511.18352v1
- Date: Sun, 23 Nov 2025 08:59:47 GMT
- Title: MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference
- Authors: Zitong Xu, Dake Shen, Yaosong Du, Kexiang Hao, Jinghan Huang, Xiande Huang,
- Abstract summary: We present textbfUniPrefer-100K, a dataset of images, videos, and associated text that describes the styles users tend to prefer.<n>We then propose textbfMagicWand, a universal generation and evaluation agent that enhances prompts based on user preferences.<n>Experiments on UniPreferBench demonstrate that MagicWand consistently generates content and evaluations that are well aligned with user preferences.
- Score: 2.271682519456254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in AIGC (Artificial Intelligence Generated Content) models have enabled significant progress in image and video generation. However, users still struggle to obtain content that aligns with their preferences due to the difficulty of crafting detailed prompts and the lack of mechanisms to retain their preferences. To address these challenges, we construct \textbf{UniPrefer-100K}, a large-scale dataset comprising images, videos, and associated text that describes the styles users tend to prefer. Based on UniPrefer-100K, we propose \textbf{MagicWand}, a universal generation and evaluation agent that enhances prompts based on user preferences, leverages advanced generation models for high-quality content, and applies preference-aligned evaluation and refinement. In addition, we introduce \textbf{UniPreferBench}, the first large-scale benchmark with over 120K annotations for assessing user preference alignment across diverse AIGC tasks. Experiments on UniPreferBench demonstrate that MagicWand consistently generates content and evaluations that are well aligned with user preferences across a wide range of scenarios.
Related papers
- P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling [66.55381105691818]
We propose P-GenRM, the first Personalized Generative Reward Model with test-time user-based scaling.<n>P-GenRM transforms preference signals into structured evaluation chains that derive adaptive personas and scoring rubrics.<n>It further clusters users into User Prototypes and introduces a dual-granularity scaling mechanism.
arXiv Detail & Related papers (2026-02-12T16:07:22Z) - Reasoning-Based Personalized Generation for Users with Sparse Data [120.94029850012045]
We introduce GraSPer, a novel framework for enhancing personalized text generation under sparse context.<n>GraSPer first augments user context by predicting items that the user would likely interact with in the future.<n>With reasoning alignment, it then generates texts for these interactions to enrich the augmented context.<n>In the end, it generates personalized outputs conditioned on both the real and synthetic histories.
arXiv Detail & Related papers (2026-01-31T01:54:23Z) - RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering [50.42577862494645]
We present RAG-IGBench, a benchmark designed to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering.<n>RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content.
arXiv Detail & Related papers (2025-10-11T03:06:39Z) - PREFINE: Personalized Story Generation via Simulated User Critics and User-Specific Rubric Generation [2.8324853634693614]
PREFINE is a novel framework that extends the Critique-and-Refine paradigm to personalization.<n> PREFINE constructs a pseudo-user agent from a user's interaction history and generates user-specific rubrics.<n>Our approach holds potential for enabling efficient personalization in broader applications, such as dialogue systems, education, and recommendation.
arXiv Detail & Related papers (2025-09-16T16:39:40Z) - Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z) - LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [78.73711446918814]
We propose a novel framework named LATex for AG-ReID, which adopts prompt-tuning strategies to leverage attribute-based text knowledge.<n>Our framework can fully leverage attribute-based text knowledge to improve AGReID performance.
arXiv Detail & Related papers (2025-03-31T04:47:05Z) - From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment [41.96246165999026]
Large language models (LLMs) have traditionally been aligned through one-size-fits-all approaches.<n>This paper introduces a comprehensive framework for scalable personalized alignment of LLMs.
arXiv Detail & Related papers (2025-03-19T17:41:46Z) - Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation [37.86741955785968]
We propose a novel Stylistic-Content Aware Personalized Headline Generation (SCAPE) framework.<n>SCAPE extracts both content and stylistic features from headlines with the aid of large language model (LLM) collaboration.<n>It adaptively integrates users' long- and short-term interests through a contrastive learning-based hierarchical fusion network.
arXiv Detail & Related papers (2025-01-21T05:30:20Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations [38.44534579040017]
We introduce EmbSum, a framework that enables offline pre-computations of users and candidate items.
The model's ability to generate summaries of user interests serves as a valuable by-product, enhancing its usefulness for personalized content recommendations.
arXiv Detail & Related papers (2024-05-19T04:31:54Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - The Stereotyping Problem in Collaboratively Filtered Recommender Systems [77.56225819389773]
We show that matrix factorization-based collaborative filtering algorithms induce a kind of stereotyping.
If preferences for a textitset of items are anti-correlated in the general user population, then those items may not be recommended together to a user.
We propose an alternative modelling fix, which is designed to capture the diverse multiple interests of each user.
arXiv Detail & Related papers (2021-06-23T18:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.