Fugu-MT 論文翻訳(概要): Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization

論文の概要: Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization

arxiv url: http://arxiv.org/abs/2604.11089v1
Date: Mon, 13 Apr 2026 07:10:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.383074
Title: Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization
Title（参考訳）: コンパクト・ジェネレーションフレンドリーな画像化のための構造化状態空間規則化
Authors: Jinsung Lee, Jaemin Oh, Namhun Kim, Dongwon Kim, Byung-Jun Yoon, Suha Kwak,
Abstract要約: 最新の視覚モデルと潜在空間を整合させる新しい正規化器を導入する。鍵となるアイデアは、ステートスペースモデルの隠れた状態ダイナミクスを模倣するために、トークンライザを誘導することである。提案手法は,再現率の低下を最小限に抑えながら,拡散モデルの生成品質を向上させる。
参考スコア（独自算出の注目度）: 41.67328909969333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image tokenizers are central to modern vision models as they often operate in latent spaces. An ideal latent space must be simultaneously compact and generation-friendly: it should capture image's essential content compactly while remaining easy to model with generative approaches. In this work, we introduce a novel regularizer to align latent spaces with these two objectives. The key idea is to guide tokenizers to mimic the hidden state dynamics of state-space models (SSMs), thereby transferring their critical property, frequency awareness, to latent features. Grounded in a theoretical analysis of SSMs, our regularizer enforces encoding of fine spatial structures and frequency-domain cues into compact latent features; leading to more effective use of representation capacity and improved generative modelability. Experiments demonstrate that our method improves generation quality in diffusion models while incurring only minimal loss in reconstruction fidelity.
Abstract（参考訳）: 画像トークン化器は、しばしば潜伏空間で動作するため、現代の視覚モデルの中心である。理想の潜伏空間は、同時にコンパクトで生成フレンドリでなければならない: 生成的アプローチで容易にモデル化しながら、画像の本質的な内容をコンパクトにキャプチャする必要がある。そこで本研究では,この2つの目的に対して潜在空間を整列させる新しい正規化器を提案する。鍵となる考え方は、トークンーザが状態空間モデル(SSM)の隠れ状態のダイナミクスを模倣するように誘導することであり、それによってそれらの重要な特性、周波数認識、潜伏する特徴を伝達する。 SSMを理論的に解析し,空間構造と周波数領域のキューをコンパクトな潜在特性に符号化し,表現能力の有効利用と生成性の向上を実現した。実験により,本手法は再現率の低下を最小限に抑えつつ,拡散モデルの生成品質を向上させることを示した。

論文の概要: Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization

関連論文リスト