Fugu-MT 論文翻訳(概要): EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

論文の概要: EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

arxiv url: http://arxiv.org/abs/2509.12159v1
Date: Mon, 15 Sep 2025 17:23:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.42581
Title: EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
Title（参考訳）: EfficientUICoder:入力および出力トークン圧縮による効率的なMLLMベースのUIコード生成
Authors: Jingyu Xiao, Zhongyi Zhang, Yuxuan Wan, Yintong Huo, Yang Liu, Michael R. Lyu,
Abstract要約: マルチモーダルな大規模言語モデルはUI2Codeタスクで例外的なパフォーマンスを示している。これらのタスクは、大量の入力画像トークンと大量の出力コードトークンを必要とするため、計算オーバーヘッドが大幅に増加する。 3つのキーコンポーネントを持つ効率的なUIコード生成のための圧縮フレームワークであるEfficientUICoderを提案する。
参考スコア（独自算出の注目度）: 40.699996393407204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Large Language Models have demonstrated exceptional performance in UI2Code tasks, significantly enhancing website development efficiency. However, these tasks incur substantially higher computational overhead than traditional code generation due to the large number of input image tokens and extensive output code tokens required. Our comprehensive study identifies significant redundancies in both image and code tokens that exacerbate computational complexity and hinder focus on key UI elements, resulting in excessively lengthy and often invalid HTML files. We propose EfficientUICoder, a compression framework for efficient UI code generation with three key components. First, Element and Layout-aware Token Compression preserves essential UI information by detecting element regions and constructing UI element trees. Second, Region-aware Token Refinement leverages attention scores to discard low-attention tokens from selected regions while integrating high-attention tokens from unselected regions. Third, Adaptive Duplicate Token Suppression dynamically reduces repetitive generation by tracking HTML/CSS structure frequencies and applying exponential penalties. Extensive experiments show EfficientUICoderachieves a 55%-60% compression ratio without compromising webpage quality and delivers superior efficiency improvements: reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs. Code is available at https://github.com/WebPAI/EfficientUICoder.
Abstract（参考訳）: マルチモーダルな大規模言語モデルはUI2Codeタスクにおいて例外的なパフォーマンスを示し、Webサイトの開発効率を大幅に向上させた。しかし、これらのタスクは大量の入力画像トークンと大量の出力コードトークンを必要とするため、従来のコード生成よりもかなり高い計算オーバーヘッドを発生させる。我々の包括的な研究は、計算複雑性を悪化させ、主要なUI要素に焦点を絞るイメージトークンとコードトークンの両方において、重大な冗長性を識別し、その結果、過度に長く、しばしば無効なHTMLファイルとなる。 3つのキーコンポーネントを持つ効率的なUIコード生成のための圧縮フレームワークであるEfficientUICoderを提案する。まず、要素領域を検出し、UI要素ツリーを構築することで、要素とレイアウトを意識したToken Compressionが重要なUI情報を保存する。第二に、リージョン対応のトークンリファインメントは注意スコアを利用して、選択されたリージョンからの低アテンショントークンを破棄し、選択されていないリージョンからの高アテンショントークンを統合する。第三に、Adaptive Duplicate Token Suppressionは、HTML/CSS構造周波数を追跡し、指数的な罰則を適用することにより、繰り返し生成を動的に削減する。大規模な実験では、Webページの品質を損なうことなく、55%-60%の圧縮比を実現し、計算コストを44.9%削減し、生成トークンを41.4%削減し、プリフィルタイムを46.6%、推論時間を34BレベルのMLLMで48.8%削減した。コードはhttps://github.com/WebPAI/EfficientUICoder.comで入手できる。

論文の概要: EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

関連論文リスト