Spanning the Visual Analogy Space with a Weight Basis of LoRAs
- URL: http://arxiv.org/abs/2602.15727v1
- Date: Tue, 17 Feb 2026 17:02:38 GMT
- Title: Spanning the Visual Analogy Space with a Weight Basis of LoRAs
- Authors: Hila Manor, Rinon Gal, Haggai Maron, Tomer Michaeli, Gal Chechik,
- Abstract summary: Visual analogy learning enables image manipulation through demonstration rather than textual description.<n>LoRWeB specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives.<n>We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs.
- Score: 84.16188433935494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}' :: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation. Code and data are in https://research.nvidia.com/labs/par/lorweb
Related papers
- DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition [51.80782323686666]
We introduce textbfDynaPURLS, a unified framework that establishes robust, multi-scale visual-semantic correspondences.<n>Our framework leverages a large language model to generate hierarchical textual descriptions that encompass both global movements and local body-part dynamics.<n>Experiments on three large-scale benchmark datasets, including NTU RGB+D 60/120 and PKU-MMD, demonstrate that DynaPURLS significantly outperforms prior art.
arXiv Detail & Related papers (2025-12-12T10:39:10Z) - Seg-VAR: Image Segmentation with Visual Autoregressive Modeling [60.79579744943664]
We propose a novel framework that rethinks segmentation as a conditional autoregressive mask generation problem.<n>This is achieved by replacing the discriminative learning with the latent learning process.<n>Our method incorporates three core components: (1) an image encoder generating latent priors from input images, (2) a spatial-aware seglat (a latent expression of segmentation mask) encoder that maps segmentation masks into discrete latent tokens, and (3) a decoder reconstructing masks from these latents.
arXiv Detail & Related papers (2025-11-16T13:36:19Z) - StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold [51.93627542334909]
Low-rank adaptation (LoRA) has been widely adopted as a parameter-efficient technique for fine-tuning large-scale pre-trained models.<n>We propose a geometry-aware extension of LoRA that uses a three-factor decomposition $U!SVtop$.
arXiv Detail & Related papers (2025-10-02T11:59:13Z) - Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z) - Score Change of Variables [0.0]
We show that for a smooth, invertible transformation $mathbfy = phi(mathbfx)$, the transformed score function $nabla_mathbfy log q(mathbfy)$ can be expressed directly in terms of $nabla_mathbfx log p(mathbfx)$.<n>We also introduce generalized sliced score matching, extending traditional sliced score matching from linear projections to arbitrary smooth transformations.
arXiv Detail & Related papers (2024-12-10T20:27:15Z) - RefineStyle: Dynamic Convolution Refinement for StyleGAN [15.230430037135017]
In StyleGAN, convolution kernels are shaped by both static parameters shared across images.
$mathcalW+$ space is often used for image inversion and editing.
This paper proposes an efficient refining strategy for dynamic kernels.
arXiv Detail & Related papers (2024-10-08T15:01:30Z) - SBoRA: Low-Rank Adaptation with Regional Weight Updates [19.15481369459963]
This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models.
SBoRA reduces the number of trainable parameters by half or doubles the rank with the similar number of trainable parameters as LoRA.
Our results demonstrate the superiority of SBoRA-FA over LoRA in various fine-tuning tasks, including commonsense reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2024-07-07T15:37:13Z) - Mixture-of-Subspaces in Low-Rank Adaptation [19.364393031148236]
We introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models.<n>To be more flexible, we jointly learn the mixer with the original LoRA weights, and term the method Mixture-of-Subspaces LoRA.<n>MoSLoRA consistently outperforms LoRA on tasks in different modalities, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation.
arXiv Detail & Related papers (2024-06-16T14:19:49Z) - Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers [0.0]
We show how Transformers can be interpreted as dense Expectation-Maximization algorithms performed on Bayesian Nets.
We propose a new model design paradigm, namely Vertical LoRA, which reduces the parameter count dramatically while preserving performance.
The results show that 1) with VLoRA, the Transformer model parameter count can be reduced dramatically and 2) the performance of the original model is preserved.
arXiv Detail & Related papers (2024-06-13T16:51:33Z) - Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models [10.827800772359844]
We study the computational limits of Low-Rank Adaptation (LoRA) for finetuning transformer-based models using fine-grained complexity theory.<n>Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup.
arXiv Detail & Related papers (2024-06-05T10:44:08Z) - SPAGHETTI: Editing Implicit Shapes Through Part Aware Generation [85.09014441196692]
We introduce a method for $mathbfE$diting $mathbfI$mplicit $mathbfS$hapes $mathbfT$hrough.
Our architecture allows for manipulation of implicit shapes by means of transforming, interpolating and combining shape segments together.
arXiv Detail & Related papers (2022-01-31T12:31:41Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.