Fugu-MT 論文翻訳(概要): Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

論文の概要: Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

arxiv url: http://arxiv.org/abs/2509.18831v1
Date: Tue, 23 Sep 2025 09:17:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.793987
Title: Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
Title（参考訳）: テキストスライダ:LoRAアダプタによる画像・ビデオ合成のための効率的かつプラグイン・アンド・プレイ連続概念制御
Authors: Pin-Yen Chiu, I-Sheng Fang, Jun-Cheng Chen,
Abstract要約: Text Sliderは、視覚概念を継続的に制御するための軽量で効率的でプラグアンドプレイのフレームワークである。事前訓練されたテキストエンコーダ内の低ランク方向を識別し、視覚概念の連続的な制御を可能にする。マルチコンセプト合成と連続制御をサポートし、画像合成とビデオ合成の両方において微細で柔軟な操作を可能にする。
参考スコア（独自算出の注目度）: 13.392855357208811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in diffusion models have significantly improved image and video synthesis. In addition, several concept control methods have been proposed to enable fine-grained, continuous, and flexible control over free-form text prompts. However, these methods not only require intensive training time and GPU memory usage to learn the sliders or embeddings but also need to be retrained for different diffusion backbones, limiting their scalability and adaptability. To address these limitations, we introduce Text Slider, a lightweight, efficient and plug-and-play framework that identifies low-rank directions within a pre-trained text encoder, enabling continuous control of visual concepts while significantly reducing training time, GPU memory consumption, and the number of trainable parameters. Furthermore, Text Slider supports multi-concept composition and continuous control, enabling fine-grained and flexible manipulation in both image and video synthesis. We show that Text Slider enables smooth and continuous modulation of specific attributes while preserving the original spatial layout and structure of the input. Text Slider achieves significantly better efficiency: 5$\times$ faster training than Concept Slider and 47$\times$ faster than Attribute Control, while reducing GPU memory usage by nearly 2$\times$ and 4$\times$, respectively.
Abstract（参考訳）: 拡散モデルの最近の進歩は、画像合成とビデオ合成を大幅に改善した。さらに、自由形式のテキストプロンプトの微粒化、連続化、フレキシブルな制御を可能にするために、いくつかの概念制御法が提案されている。しかしながら、これらのメソッドはスライダーや埋め込みを学ぶために、集中的なトレーニング時間とGPUメモリの使用を必要とするだけでなく、異なる拡散バックボーンのために再トレーニングし、スケーラビリティと適応性を制限する必要がある。この制限に対処するために,テキストスライダを導入する。テキストエンコーダ内の低ランク方向を識別し,トレーニング時間,GPUメモリ使用量,トレーニング可能なパラメータ数を大幅に削減しつつ,視覚概念の連続的な制御を可能にする。さらに、Text Sliderはマルチコンセプト合成と連続制御をサポートし、画像合成とビデオ合成の両方において微細で柔軟な操作を可能にする。テキストスライダは,入力の空間的レイアウトと構造を保ちながら,特定の属性のスムーズかつ連続的な調整を可能にすることを示す。テキストスライダは、Concept Sliderより5$\times$速く、Attribute Controlより47$\times$速く、GPUメモリ使用量をそれぞれ2$\times$と4$\times$に削減します。

関連論文リスト

FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video [19.20143810117644]
概念スライダは、テキストコントラストを通して意味的な方向を発見することによって、有望な方向を提供する。 FreeSlidersは、推論中にCSの公式を部分的に見積もることで、完全にトレーニング不要でモダリティに依存しない。提案手法は,モダリティ間のプラグアンドプレイ,トレーニング不要の概念制御,既存のベースラインの改善,原則生成のための新しいツールの確立を可能にする。
論文参考訳（メタデータ） (2025-10-30T17:59:58Z)
Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA [84.89284738178932]
テキスト・ビデオ・モデルにおける動的概念のパーソナライズのためのゼロショットフレームワークを提案する。提案手法は,空間的に入力と出力のペアを整理する構造化2x2ビデオグリッドを利用する。専用のグリッドフィルモジュールが部分的に観測されたレイアウトを完了し、時間的に一貫性とアイデンティティを保った出力を生成する。
論文参考訳（メタデータ） (2025-07-23T22:09:38Z)
STORM: Token-Efficient Long Video Understanding for Multimodal LLMs [116.4479155699528]
STORMは、イメージエンコーダとビデオLLMの間に専用のテンポラリエンコーダを組み込んだ、新しいアーキテクチャである。我々は,STORMが様々な長いビデオ理解ベンチマークにおいて最先端の結果を達成することを示す。
論文参考訳（メタデータ） (2025-03-06T06:17:38Z)
Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models [53.385754347812835]
概念スライダは、学習概念(属性/オブジェクト)によるきめ細かい画像制御と編集方法を導入したこのアプローチは、学習概念に使用されるローランドアダプタ(LoRA)のロードとアンロードにより、パラメータを追加し、推論時間を増加させる。そこで本研究では,テキストエンコーダを共有するモデル間で一般化可能な,テキスト埋め込みによる概念学習のための簡単なテキストインバージョン手法を提案する。
論文参考訳（メタデータ） (2024-09-25T01:02:30Z)
LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation [52.16008431411513]
LASERは、チューニング不要のLCM駆動のアテンションコントロールフレームワークである。テキスト条件付きイメージ・トゥ・アニメーションベンチマークを提案し,その有効性と有効性を検証する。
論文参考訳（メタデータ） (2024-04-21T07:13:56Z)
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
拡散モデルから画像生成における属性を正確に制御できる解釈可能な概念スライダを作成する手法を提案する。スライダは、プロンプトやサンプル画像の小さなセットを使用して作成される。本手法は、物体の変形の修復や変形した手の固定など、安定XL拡散における持続的品質問題に対処するのに役立つ。
論文参考訳（メタデータ） (2023-11-20T18:59:01Z)
ControlVideo: Training-free Controllable Text-to-Video Generation [117.06302461557044]
ControlVideoは、自然で効率的なテキスト・ビデオ生成を可能にするフレームワークである。 NVIDIA 2080Tiを使って、ショートビデオとロングビデオの両方を数分で生成する。
論文参考訳（メタデータ） (2023-05-22T14:48:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。