Fugu-MT 論文翻訳(概要): SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

論文の概要: SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective

arxiv url: http://arxiv.org/abs/2303.09270v1
Date: Thu, 16 Mar 2023 12:53:07 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-17 15:41:43.449379
Title: SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective
Title（参考訳）: spectralclip: スペクトルの観点から見たテキストガイドスタイル転送におけるアーティファクトの防止
Authors: Zipeng Xu, Songlong Xing, Enver Sangineto, Nicu Sebe
Abstract要約: 対照的に、CLIP(Contrastive Language- Image Pre-Training)は、幅広い視覚言語横断タスクのために、最先端の技術を更新した。 CLIPを直接使用してスタイルの転送をガイドすると、望ましくないアーティファクトがイメージ上に広がります。本稿では,CLIPビジョンエンコーダ上にスペクトルフィルタリング層を実装したSpectralCLIPを提案する。
参考スコア（独自算出の注目度）: 70.8715655507571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastive Language-Image Pre-Training (CLIP) has refreshed the state of the art for a broad range of vision-language cross-modal tasks. Particularly, it has created an intriguing research line of text-guided image style transfer, dispensing with the need for style reference images as in traditional style transfer methods. However, directly using CLIP to guide the transfer of style leads to undesirable artifacts (mainly written words and unrelated visual entities) spread over the image, partly due to the entanglement of visual and written concepts inherent in CLIP. Inspired by the use of spectral analysis in filtering linguistic information at different granular levels, we analyse the patch embeddings from the last layer of the CLIP vision encoder from the perspective of spectral analysis and find that the presence of undesirable artifacts is highly correlated to some certain frequency components. We propose SpectralCLIP, which implements a spectral filtering layer on top of the CLIP vision encoder, to alleviate the artifact issue. Experimental results show that SpectralCLIP prevents the generation of artifacts effectively in quantitative and qualitative terms, without impairing the stylisation quality. We further apply SpectralCLIP to text-conditioned image generation and show that it prevents written words in the generated images. Code is available at https://github.com/zipengxuc/SpectralCLIP.
Abstract（参考訳）: 対照的な言語イメージプリトレーニング(clip)は、幅広い視覚言語クロスモーダルタスクの最先端を更新した。特に、従来のスタイル転送法のようにスタイル参照画像の必要性をなくし、テキストガイドによる画像転送の興味深い研究ラインを作成している。しかし、CLIPを直接使用してスタイルの転送をガイドすると、CLIPに固有の視覚的および文字的概念の絡み合いが原因で、望ましくないアーティファクト(主に書かれた単語と無関係な視覚的実体)がイメージ全体に広がる。異なる粒度での言語情報フィルタリングにおけるスペクトル解析の利用に触発されて,クリップビジョンエンコーダの最終層からのパッチ埋め込みをスペクトル解析の観点から解析し,好ましくないアーティファクトの存在が特定の周波数成分と高い相関関係にあることを見出した。本稿では,CLIPビジョンエンコーダ上にスペクトルフィルタリング層を実装したSpectralCLIPを提案する。実験結果から,SpectralCLIPは,スタイリゼーションの品質を損なうことなく,定量的・質的手法で人工物の発生を効果的に防止できることが示された。さらに,テキスト条件付き画像生成にspectrumclipを適用し,生成された画像中の書き言葉を防止する。コードはhttps://github.com/zipengxuc/SpectralCLIPで入手できる。

関連論文リスト

Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP [44.90909692140324]
私たちはCLIP(Grad-ECLIP)のためのグラディエントに基づく視覚的・テキスト的説明法を提案する。トークンの特徴にチャネル重みと空間重みを適用し,高品質な視覚的説明を行う。また,CLIPファインチューニングにおける微粒化アライメントを高めるため,Grad-ECLIPを用いたアプリケーションを提案する。
論文参考訳（メタデータ） (2025-02-26T04:50:20Z)
Dissecting CLIP: Decomposition with a Schur Complement-based Approach [8.056359341994941]
テキスト・ツー・イメージ・モデルの本質的な多様性を定量化し,解釈するために,CLIP埋め込みの応用を拡張した。画像のCLIP埋め込みにおいて、与えられたプロンプトの影響を無効化するために、Schur補数に基づく分解を用いることを実証する。
論文参考訳（メタデータ） (2024-12-24T18:07:57Z)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives [65.82577305915643]
Contrastive Language-Image Pretraining (CLIP) モデルは、表現を学ぶためにテキストと視覚的モダリティ間の相互情報を最大化する。そこで本研究では,テキスト・ツー・イメージ・ジェネレータを用いて,文脈内学習による「ハード」の負の字幕生成と,それに対応する負のイメージ生成が解となることを示す。提案手法はTripletCLIPと呼ばれ,CLIPの構成能力を向上し,SugarCrepeベンチマークでは9%以上向上した。
論文参考訳（メタデータ） (2024-11-04T19:24:59Z)
Finetuning CLIP to Reason about Pairwise Differences [52.028073305958074]
本稿では,CLIPのような視覚言語モデルの学習手法を提案する。我々はまず,ある属性による画像のランク付け能力を大幅に向上させることを実証した。また、得られる埋め込みは埋め込み空間においてより大きな幾何学的性質に従うことを示す。
論文参考訳（メタデータ） (2024-09-15T13:02:14Z)
Interpreting CLIP's Image Representation via Text-Based Decomposition [73.54377859089801]
CLIP画像エンコーダは,個々のモデルコンポーネントが最終表現にどう影響するかを解析することによって検討する。画像表現は、個々の画像パッチ、モデル層、アテンションヘッドにまたがる和として分解する。この理解を利用して、CLIPからスプリケートな機能を取り除き、強力なゼロショットイメージセグメンタを作成します。
論文参考訳（メタデータ） (2023-10-09T17:59:04Z)
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs [128.47600914674985]
本稿では,CLIPモデルとStyleGANを利用した新しいフレームワークであるCLIP2GANを提案する。 CLIP2GANのキーとなるアイデアは、CLIPの出力機能埋め込みスペースとStyleGANの入力潜在スペースをブリッジすることです。
論文参考訳（メタデータ） (2022-11-28T04:07:17Z)
CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation [4.078926358349661]
対照的に、CLIP(Contrastive Language- Image Pre-Training)は、画像とテキストを共同の潜在空間に埋め込むことでブリッジする。共同空間における画像とテキストの埋め込みの相違により、最適化対象としてテキストの埋め込みを用いることで、結果の画像に望ましくないアーティファクトがしばしば導入される。テキスト誘導画像操作の性能向上のための最適化ターゲットとして,CLIPプロジェクション拡張埋め込み(PAE)を導入する。
論文参考訳（メタデータ） (2022-10-08T05:12:25Z)
No Token Left Behind: Explainability-Aided Image Classification and Generation [79.4957965474334]
ここでは、CLIPが入力のすべての関連する意味的部分に焦点を当てることを保証するために、損失項を追加する新しい説明可能性に基づくアプローチを提案する。本手法は, 追加訓練や微調整を伴わずに, 認識率の向上を図っている。
論文参考訳（メタデータ） (2022-04-11T07:16:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。