Related papers: Scaling Up Personalized Aesthetic Assessment via Task Vector Customization

Scaling Up Personalized Aesthetic Assessment via Task Vector Customization

URL: http://arxiv.org/abs/2407.07176v1
Date: Tue, 9 Jul 2024 18:42:41 GMT
Title: Scaling Up Personalized Aesthetic Assessment via Task Vector Customization
Authors: Jooyeol Yun, Jaegul Choo,
Abstract summary: We present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment. By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals.
Score: 37.66059382315255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The task of personalized image aesthetic assessment seeks to tailor aesthetic score prediction models to match individual preferences with just a few user-provided inputs. However, the scalability and generalization capabilities of current approaches are considerably restricted by their reliance on an expensive curated database. To overcome this long-standing scalability challenge, we present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment. Specifically, we view each database as a distinct image score regression task that exhibits varying degrees of personalization potential. By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals. This approach of integrating multiple models allows us to harness a substantial amount of data. Our extensive experiments demonstrate the effectiveness of our approach in generalizing to previously unseen domains-a challenge previous approaches have struggled to achieve-making it highly applicable to real-world scenarios. Our novel approach significantly advances the field by offering scalable solutions for personalized aesthetic assessment and establishing high standards for future research. https://yeolj00.github.io/personal-projects/personalized-aesthetics/

Related papers

Low-Rank Head Avatar Personalization with Registers [36.7667914190956]
We introduce a novel method for low-rank personalization of a generic model for head avatar generation.<n>Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively.
arXiv Detail & Related papers (2025-06-02T17:53:14Z)
Personalization Toolkit: Training Free Personalization of Large Vision Language Models [11.026377387506216]
Personalization of Large Vision-Language Models (LVLMs) involves customizing models to recognize specific users and object instances, and to generate contextually tailored responses.<n>Existing approaches typically rely on time-consuming test-time training for each user or object, making them impractical for real-world deployment.<n>We present a novel training-free approach to LVLM personalization and introduce a comprehensive real-world benchmark designed to rigorously evaluate various aspects of the personalization task.
arXiv Detail & Related papers (2025-02-04T16:19:20Z)
Personalized Representation from Personalized Generation [36.848215621708235]
We formalize the challenge of using personalized synthetic data to learn personalized representations. We show that our method improves personalized representation learning for diverse downstream tasks.
arXiv Detail & Related papers (2024-12-20T18:59:03Z)
How to Squeeze An Explanation Out of Your Model [13.154512864498912]
This paper proposes an approach for interpretability that is model-agnostic. By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features. Results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings.
arXiv Detail & Related papers (2024-12-06T15:47:53Z)
VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation [45.085830389820956]
This work presents an evaluation of the generalization capabilities of multi-view multi-person pose estimators to unseen datasets. It also studies the improvements by additionally using depth information. Since the new approach can not only generalize well to unseen datasets, but also to different keypoints, the first multi-view multi-person whole-body estimator is presented.
arXiv Detail & Related papers (2024-10-24T13:28:40Z)
PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization [9.594958534074074]
We introduce the PEFT-U Benchmark: a new dataset for building and evaluating NLP models for user personalization. We explore the challenge of efficiently personalizing LLMs to accommodate user-specific preferences in the context of diverse user-centered tasks.
arXiv Detail & Related papers (2024-07-25T14:36:18Z)
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement [53.86523017756224]
We present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. We collect a large dataset of naturalistic personal preferences in a simulated table-setting task. We find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
arXiv Detail & Related papers (2024-07-11T21:28:02Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP) We develop a generic and personalization generative framework, that can handle a wide range of personalized needs. Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z)
Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization [0.0]
Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior.
arXiv Detail & Related papers (2023-08-07T13:35:53Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)
Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z)
Ambiguous Images With Human Judgments for Robust Visual Event Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.