Fugu-MT 論文翻訳(概要): Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

論文の概要: Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

arxiv url: http://arxiv.org/abs/2308.13178v1
Date: Fri, 25 Aug 2023 05:00:05 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-28 15:00:53.823038
Title: Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions
Title（参考訳）: テキスト領域を付加したオブジェクト中心層表現を用いた自己教師付きシーンテキストセグメンテーション
Authors: Yibo Wang, Yunhu Ye, Yuanpeng Mao, Yanwei Yu and Yuanping Song
Abstract要約: 本稿では,オブジェクト中心の表現を階層的に分離し,画像からテキストや背景に分割する自己教師付きシーンテキストセグメンテーションアルゴリズムを提案する。いくつかの公開シーンのテキストデータセットにおいて、この手法は最先端の教師なしセグメンテーションアルゴリズムよりも優れている。
参考スコア（独自算出の注目度）: 22.090074821554754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text segmentation tasks have a very wide range of application values, such as image editing, style transfer, watermark removal, etc.However, existing public datasets are of poor quality of pixel-level labels that have been shown to be notoriously costly to acquire, both in terms of money and time. At the same time, when pretraining is performed on synthetic datasets, the data distribution of the synthetic datasets is far from the data distribution in the real scene. These all pose a huge challenge to the current pixel-level text segmentation algorithms.To alleviate the above problems, we propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background. In our method, we propose two novel designs which include Region Query Module and Representation Consistency Constraints adapting to the unique properties of text as complements to Auto Encoder, which improves the network's sensitivity to texts.For this unique design, we treat the polygon-level masks predicted by the text localization model as extra input information, and neither utilize any pixel-level mask annotations for training stage nor pretrain on synthetic datasets.Extensive experiments show the effectiveness of the method proposed. On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
Abstract（参考訳）: テキストセグメンテーションタスクは、画像編集、スタイル転送、透かし除去など、非常に広い範囲のアプリケーション価値を持っているが、既存の公開データセットは、お金と時間の両方で取得するのに悪名高いと判明したピクセルレベルのラベルの品質が劣っている。同時に、合成データセット上で事前学習を行う場合、合成データセットのデータ分布は実際のシーンにおけるデータ分布からかけ離れている。これらすべてが現在のピクセルレベルのテキストセグメンテーションアルゴリズムに対する大きな課題であり、上記の問題を緩和するために、オブジェクト中心の方法で表現を階層的に分離し、画像をテキストや背景に分割する自己教師付きシーンテキストセグメンテーションアルゴリズムを提案する。 In our method, we propose two novel designs which include Region Query Module and Representation Consistency Constraints adapting to the unique properties of text as complements to Auto Encoder, which improves the network's sensitivity to texts.For this unique design, we treat the polygon-level masks predicted by the text localization model as extra input information, and neither utilize any pixel-level mask annotations for training stage nor pretrain on synthetic datasets.Extensive experiments show the effectiveness of the method proposed. いくつかのパブリックシーンテキストデータセットでは、この手法は最先端の教師なしセグメンテーションアルゴリズムよりも優れている。

関連論文リスト

Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies [3.7498611358320733]
合成画像は実世界のシナリオを忠実に再現することができないため、複雑な実世界のイメージを扱う際には性能の相違が生じる。近年の自己教師付き学習技術,特にコントラスト学習とマスク付き画像モデリングは,未ラベルの実際のテキスト画像を利用して領域ギャップを狭めている。我々のMMS(Multi-Masking Strategy)は、ランダムパッチ、ブロックワイズ、スパンマスクをMIMフレームに統合し、低レベルのテキスト表現と高レベルのテキスト表現を共同で学習する。
論文参考訳（メタデータ） (2025-05-11T05:52:55Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
シーンテキスト認識(STR)事前学習法は,主に合成データセットに依存し,顕著な進歩を遂げている。 STR(DPTR)用テキストのみを用いたDecoder Pre-trainingという新しい手法を提案する。 DPTRはCLIPテキストエンコーダが生成したテキスト埋め込みを擬似視覚埋め込みとして扱い、デコーダの事前訓練に使用する。
論文参考訳（メタデータ） (2024-08-11T06:36:42Z)
WAS: Dataset and Methods for Artistic Text Segmentation [57.61335995536524]
本稿では,芸術的テキストセグメンテーションの課題に焦点を当て,実際の芸術的テキストセグメンテーションデータセットを構築する。本稿では,モデルが特別な形状のストローク領域を無視するのを防ぐために,レイヤワイド・モーメント・クエリを用いたデコーダを提案する。また,大域構造に焦点を合わせるために,骨格支援ヘッドを提案する。
論文参考訳（メタデータ） (2024-07-31T18:29:36Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
DiffTextは、前景のテキストと背景の本質的な特徴をシームレスにブレンドするパイプラインです。テキストインスタンスが少なくなると、生成したテキストイメージはテキスト検出を支援する他の合成データを一貫して上回ります。
論文参考訳（メタデータ） (2023-11-28T06:51:28Z)
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation [29.274362919954218]
精度の高いラベル付きトレーニングデータを自動的に生成する新しいパラダイムを提案する。提案手法は、トレーニングデータ生成を前景オブジェクト生成とコンテキスト的に一貫性のある背景生成に分離する。 5つのオブジェクト検出とセグメンテーションデータセットに対するアプローチの利点を実証する。
論文参考訳（メタデータ） (2023-09-12T04:41:45Z)
Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
テキスト・ツー・イメージ・パーソナリティ検索(TIPR)の目的は、与えられたテキスト記述に従って特定の人物画像を取得することである。本稿では,人物画像と対応するテキスト間のきめ細かいインタラクションとアライメントを構築するための新しいTIPRフレームワークを提案する。
論文参考訳（メタデータ） (2023-07-18T08:23:46Z)
SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaTextはオープン語彙シーン制御を用いたテキスト・ツー・イメージ生成の新しい手法である。シーン全体を記述したグローバルテキストプロンプトに加えて、ユーザはセグメンテーションマップを提供する。現状拡散モデルである画素ベースと潜在条件ベースでの有効性を示す。
論文参考訳（メタデータ） (2022-11-25T18:59:10Z)
SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
任意の精度のセマンティックレイアウトから条件付き画像合成のための新しいフレームワークを提案する。このフレームワークは、形状情報のない最低レベルのテキスト・トゥ・イメージ(T2I)に自然に還元され、最高レベルのセグメンテーション・トゥ・イメージ(S2I)となる。本稿では,この新たなセットアップの課題に対処する,新しいテクニックをいくつか紹介する。
論文参考訳（メタデータ） (2022-11-21T18:59:05Z)
CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
エンドツーエンドのCLIP駆動参照画像フレームワーク(CRIS)を提案する。 CRISは、テキストとピクセルのアライメントを達成するために、視覚言語によるデコーディングとコントラスト学習に頼っている。提案するフレームワークは, 後処理を伴わずに, 最先端の性能を著しく向上させる。
論文参考訳（メタデータ） (2021-11-30T07:29:08Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。