Related papers: Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

URL: http://arxiv.org/abs/2412.19917v1
Date: Fri, 27 Dec 2024 20:33:39 GMT
Title: Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
Authors: Enze Xie, Jiaho Lyu, Daiqing Wu, Huawen Shen, Yu Zhou,
Abstract summary: Char-SAM is a pipeline that turns SAM into a low-cost segmentation annotator with a character-level visual prompt.<n> Char-SAM generates high-quality scene text segmentation annotations automatically.<n>Its training-free nature also enables the generation of high-quality scene text segmentation datasets from real-world datasets like COCO-Text and MLT17.
Score: 12.444549174054988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent emergence of the Segment Anything Model (SAM) enables various domain-specific segmentation tasks to be tackled cost-effectively by using bounding boxes as prompts. However, in scene text segmentation, SAM can not achieve desirable performance. The word-level bounding box as prompts is too coarse for characters, while the character-level bounding box as prompts suffers from over-segmentation and under-segmentation issues. In this paper, we propose an automatic annotation pipeline named Char-SAM, that turns SAM into a low-cost segmentation annotator with a Character-level visual prompt. Specifically, leveraging some existing text detection datasets with word-level bounding box annotations, we first generate finer-grained character-level bounding box prompts using the Character Bounding-box Refinement CBR module. Next, we employ glyph information corresponding to text character categories as a new prompt in the Character Glyph Refinement (CGR) module to guide SAM in producing more accurate segmentation masks, addressing issues of over-segmentation and under-segmentation. These modules fully utilize the bbox-to-mask capability of SAM to generate high-quality text segmentation annotations automatically. Extensive experiments on TextSeg validate the effectiveness of Char-SAM. Its training-free nature also enables the generation of high-quality scene text segmentation datasets from real-world datasets like COCO-Text and MLT17.

Related papers

SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters [0.5755004576310334]
This paper introduces SAM-PTx, a parameter-efficient approach for adapting SAM using frozen CLIP-derived text embeddings as class-level semantic guidance.<n>Specifically, we propose a lightweight adapter called Parallel-Text that injects text embeddings into SAM's image, enabling semantics-guided segmentation.<n>We show that incorporating fixed text embeddings as input improves segmentation performance over purely spatial prompt baselines.
arXiv Detail & Related papers (2025-07-31T23:26:39Z)
Talk2SAM: Text-Guided Semantic Enhancement for Complex-Shaped Object Segmentation [0.0]
We propose Talk2SAM, a novel approach that integrates textual guidance to improve object segmentation.<n>The method uses CLIP-based embeddings derived from user-provided text prompts to identify relevant semantic regions.<n>Talk2SAM consistently outperforms SAM-HQ, achieving up to +5.9% IoU and +8.3% boundary IoU improvements.
arXiv Detail & Related papers (2025-06-03T19:53:10Z)
Customized SAM 2 for Referring Remote Sensing Image Segmentation [21.43947114468122]
We propose RS2-SAM 2, a novel framework that adapts SAM 2 to RRSIS by aligning the adapted RS features and textual features. We also introduce a text-guided boundary loss to optimize segmentation boundaries by computing text-weighted gradient differences. Experimental results on several RRSIS benchmarks demonstrate that RS2-SAM 2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-03-10T12:48:29Z)
Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt [10.17947324152468]
Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens. This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and fine-grained text. Our proposed method combines a general score map from the image-text process with a region score map derived from character-token matching.
arXiv Detail & Related papers (2024-09-20T15:24:26Z)
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z)
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation [97.90960864892966]
This paper introduces Hi-SAM, a unified model leveraging SAM for hierarchical text segmentation. Hi-SAM excels in segmentation across four hierarchies, including pixel-level text, word, text-line, and paragraph. Compared to the previous specialist for joint hierarchical detection and layout analysis on HierText, Hi-SAM achieves significant improvements.
arXiv Detail & Related papers (2024-01-31T15:10:29Z)
Learning to Prompt Segment Anything Models [55.805816693815835]
Segment Anything Models (SAMs) have demonstrated great potential in learning to segment anything. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts) We propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs.
arXiv Detail & Related papers (2024-01-09T16:24:25Z)
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation [60.84423786769453]
We introduce a Text Augmented Spatial-aware (TAS) zero-shot referring image segmentation framework. TAS incorporates a mask proposal network for instance-level mask extraction, a text-augmented visual-text matching score for mining the image-text correlation, and a spatial for mask post-processing. The proposed method clearly outperforms state-of-the-art zero-shot referring image segmentation methods.
arXiv Detail & Related papers (2023-10-27T10:52:50Z)
Scalable Mask Annotation for Video Text Spotting [86.72547285886183]
We propose a scalable mask annotation pipeline called SAMText for video text spotting. Using SAMText, we have created a large-scale dataset, SAMText-9M, that contains over 2,400 video clips and over 9 million mask annotations.
arXiv Detail & Related papers (2023-05-02T14:18:45Z)
Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image. We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks. We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z)
MANGO: A Mask Attention Guided One-Stage Scene Text Spotter [41.66707532607276]
We propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO. The proposed method achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks.
arXiv Detail & Related papers (2020-12-08T10:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.