Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
- URL: http://arxiv.org/abs/2407.07176v2
- Date: Wed, 16 Oct 2024 05:11:30 GMT
- Title: Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
- Authors: Jooyeol Yun, Jaegul Choo,
- Abstract summary: We present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment.
By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals.
- Score: 37.66059382315255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of personalized image aesthetic assessment seeks to tailor aesthetic score prediction models to match individual preferences with just a few user-provided inputs. However, the scalability and generalization capabilities of current approaches are considerably restricted by their reliance on an expensive curated database. To overcome this long-standing scalability challenge, we present a unique approach that leverages readily available databases for general image aesthetic assessment and image quality assessment. Specifically, we view each database as a distinct image score regression task that exhibits varying degrees of personalization potential. By determining optimal combinations of task vectors, known to represent specific traits of each database, we successfully create personalized models for individuals. This approach of integrating multiple models allows us to harness a substantial amount of data. Our extensive experiments demonstrate the effectiveness of our approach in generalizing to previously unseen domains-a challenge previous approaches have struggled to achieve-making it highly applicable to real-world scenarios. Our novel approach significantly advances the field by offering scalable solutions for personalized aesthetic assessment and establishing high standards for future research. https://yeolj00.github.io/personal-projects/personalized-aesthetics/
Related papers
- VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation [45.085830389820956]
This work presents an evaluation of the generalization capabilities of multi-view multi-person pose estimators to unseen datasets.
It also studies the improvements by additionally using depth information.
Since the new approach can not only generalize well to unseen datasets, but also to different keypoints, the first multi-view multi-person whole-body estimator is presented.
arXiv Detail & Related papers (2024-10-24T13:28:40Z) - PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization [9.594958534074074]
We introduce the PEFT-U Benchmark: a new dataset for building and evaluating NLP models for user personalization.
We explore the challenge of efficiently personalizing LLMs to accommodate user-specific preferences in the context of diverse user-centered tasks.
arXiv Detail & Related papers (2024-07-25T14:36:18Z) - DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement [53.86523017756224]
We present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference.
We collect a large dataset of naturalistic personal preferences in a simulated table-setting task.
We find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.
arXiv Detail & Related papers (2024-07-11T21:28:02Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - Revealing the Underlying Patterns: Investigating Dataset Similarity,
Performance, and Generalization [0.0]
Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task.
We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior.
arXiv Detail & Related papers (2023-08-07T13:35:53Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z) - Ambiguous Images With Human Judgments for Robust Visual Event
Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos.
All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments.
We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.