Related papers: Personalized Representation from Personalized Generation

Personalized Representation from Personalized Generation

URL: http://arxiv.org/abs/2412.16156v1
Date: Fri, 20 Dec 2024 18:59:03 GMT
Title: Personalized Representation from Personalized Generation
Authors: Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola,
Abstract summary: We formalize the challenge of using personalized synthetic data to learn personalized representations.<n>We show that our method improves personalized representation learning for diverse downstream tasks.
Score: 36.848215621708235
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern vision models excel at general purpose downstream tasks. It is unclear, however, how they may be used for personalized vision tasks, which are both fine-grained and data-scarce. Recent works have successfully applied synthetic data to general-purpose representation learning, while advances in T2I diffusion models have enabled the generation of personalized images from just a few real examples. Here, we explore a potential connection between these ideas, and formalize the challenge of using personalized synthetic data to learn personalized representations, which encode knowledge about an object of interest and may be flexibly applied to any downstream task relating to the target object. We introduce an evaluation suite for this challenge, including reformulations of two existing datasets and a novel dataset explicitly constructed for this purpose, and propose a contrastive learning approach that makes creative use of image generators. We show that our method improves personalized representation learning for diverse downstream tasks, from recognition to segmentation, and analyze characteristics of image generation approaches that are key to this gain.

Related papers

GMAIL: Generative Modality Alignment for generated Image Learning [51.071351994330605]
We propose a novel framework for discriminative use of generated images, coined GMAIL, that explicitly treats generated images as a separate modality from real images.<n>Our framework can be easily incorporated with various vision-language models, and we demonstrate its efficacy throughout extensive experiments.
arXiv Detail & Related papers (2026-02-17T05:40:25Z)
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards [86.1965460124838]
We propose a scalable multi-subject data generation pipeline.<n>We first enable single-subject personalization models to acquire knowledge of multi-image and multi-subject scenarios.<n>To enhance both subject consistency and text controllability, we design a set of Pairwise Subject-Consistency Rewards.
arXiv Detail & Related papers (2025-12-01T03:25:49Z)
A Dual-stage Prompt-driven Privacy-preserving Paradigm for Person Re-Identification [42.56589115173974]
We propose a Dual-stage Prompt-driven Privacy-preserving Paradigm (DPPP)<n>In the first stage, we generate rich prompts incorporating multi-dimensional attributes that drive the diffusion model to synthesize diverse data end-to-end.<n>In the second stage, we propose a Prompt-driven Disentanglement Mechanism (PDM) to learn domain-invariant generalization features.
arXiv Detail & Related papers (2025-11-07T09:17:48Z)
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning [68.98988753763666]
We propose VisualCloze, a universal image generation framework. VisualCloze supports a wide range of in-domain tasks, generalization to unseen ones, unseen unification of multiple tasks, and reverse generation. We introduce Graph200K, a graph-structured dataset that establishes various interrelated tasks, enhancing task density and transferable knowledge.
arXiv Detail & Related papers (2025-04-10T17:59:42Z)
Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL. We propose a data repair algorithm to curate contextually fair data to reduce model bias. We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z)
Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models [23.09033991200197]
New personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution poses a new concern regarding whether the personalized models are trained from unauthorized data. We introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models.
arXiv Detail & Related papers (2024-10-14T12:29:23Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects. We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning [35.47078178526536]
Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension. This paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process.
arXiv Detail & Related papers (2023-11-02T06:21:35Z)
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z)
Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis [39.53519330457627]
We propose a novel task of joint few-shot recognition and novel-view synthesis. We aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints. We focus on the interaction and cooperation between a generative model and a discriminative model.
arXiv Detail & Related papers (2020-08-16T19:40:56Z)
Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training. This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains. We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.