Related papers: TRACE: Your Diffusion Model is Secretly an Instance Edge Detector

TRACE: Your Diffusion Model is Secretly an Instance Edge Detector

URL: http://arxiv.org/abs/2503.07982v2
Date: Thu, 16 Oct 2025 02:11:04 GMT
Title: TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Authors: Sanghyun Jo, Ziseok Lee, Wooyeol Lee, Jonghyun Choi, Jaesik Park, Kyungsu Kim,
Abstract summary: We present TRACE, showing that text-to-image diffusion models secretly function as instance edge annotators.<n>TRACE identifies the Instance Emergence Point (IEP) where object boundaries first appear in self-attention maps, extracts boundaries through Attention Boundary Divergence (ABDiv) and distills them into a lightweight one-step edge decoder.<n>On the COCO benchmark, TRACE improves unsupervised instance segmentation by +5.1 AP, and in tag-supervised panoptic segmentation it outperforms point-supervised baselines by +1.7 PQ without using any instance-level labels.
Score: 45.119480971518946
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-quality instance and panoptic segmentation has traditionally relied on dense instance-level annotations such as masks, boxes, or points, which are costly, inconsistent, and difficult to scale. Unsupervised and weakly-supervised approaches reduce this burden but remain constrained by semantic backbone constraints and human bias, often producing merged or fragmented outputs. We present TRACE (TRAnsforming diffusion Cues to instance Edges), showing that text-to-image diffusion models secretly function as instance edge annotators. TRACE identifies the Instance Emergence Point (IEP) where object boundaries first appear in self-attention maps, extracts boundaries through Attention Boundary Divergence (ABDiv), and distills them into a lightweight one-step edge decoder. This design removes the need for per-image diffusion inversion, achieving 81x faster inference while producing sharper and more connected boundaries. On the COCO benchmark, TRACE improves unsupervised instance segmentation by +5.1 AP, and in tag-supervised panoptic segmentation it outperforms point-supervised baselines by +1.7 PQ without using any instance-level labels. These results reveal that diffusion models encode hidden instance boundary priors, and that decoding these signals offers a practical and scalable alternative to costly manual annotation. Code is available at https://github.com/shjo-april/DiffEGG.

Related papers

Clustering is back: Reaching state-of-the-art LiDAR instance segmentation without training [69.2787246878521]
We show that competitive panoptic segmentation can be achieved using only semantic labels. Our method is fully explainable, and requires no learning or parameter tuning.
arXiv Detail & Related papers (2025-03-17T14:12:08Z)
EAUWSeg: Eliminating annotation uncertainty in weakly-supervised medical image segmentation [4.334357692599945]
Weakly-supervised medical image segmentation is gaining traction as it requires only rough annotations rather than accurate pixel-to-pixel labels. We propose a novel weak annotation method coupled with its learning framework EAUWSeg to eliminate the annotation uncertainty. We show that EAUWSeg outperforms existing weakly-supervised segmentation methods.
arXiv Detail & Related papers (2025-01-03T06:21:02Z)
DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut [55.21950038225407]
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks.<n>In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method.<n>Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks.
arXiv Detail & Related papers (2024-06-05T01:32:31Z)
BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation [9.6046915661065]
How to extract instance-level masks without instance-level supervision is the main challenge of weakly supervised instance segmentation (WSIS) Popular WSIS methods estimate a displacement field (DF) via learning inter-pixel relations and perform clustering to identify instances. We propose Boundary-Assisted Instance (BAISeg), which is a novel paradigm for WSIS that realizes instance segmentation with pixel-level annotations.
arXiv Detail & Related papers (2024-05-27T15:14:09Z)
The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models [24.53385855664792]
In object detection and instance segmentation, foundation models such as SAM and DINO struggle to achieve satisfactory performance. We propose $textbfZip$ which $textbfZ$ips up CL$textbfip$ and SAM in a novel classification-first-then-discovery pipeline. Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings.
arXiv Detail & Related papers (2024-04-18T07:22:38Z)
Skeleton-Guided Instance Separation for Fine-Grained Segmentation in Microscopy [23.848474219551818]
One of the fundamental challenges in microscopy (MS) image analysis is instance segmentation (IS) We propose a novel one-stage framework named A2B-IS to address this challenge and enhance the accuracy of IS in MS images. Our method has been thoroughly validated on two large-scale MS datasets.
arXiv Detail & Related papers (2024-01-18T11:14:32Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Edge-aware Plug-and-play Scheme for Semantic Segmentation [4.297988192695948]
The proposed method can be seamlessly integrated into any state-of-the-art (SOTA) models with zero modification. The experimental results indicate that the proposed method can be seamlessly integrated into any state-of-the-art (SOTA) models with zero modification.
arXiv Detail & Related papers (2023-03-18T02:17:37Z)
SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation [22.930296667684125]
We propose a new box-supervised instance segmentation approach by developing a Semantic-aware Instance Mask (SIM) generation paradigm. Considering that the semantic-aware prototypes cannot distinguish different instances of the same semantics, we propose a self-correction mechanism. Extensive experimental results demonstrate the superiority of our proposed SIM approach over other state-of-the-art methods.
arXiv Detail & Related papers (2023-03-14T05:59:25Z)
UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT. Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z)
Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point. With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information. Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z)
SIOD: Single Instance Annotated Per Category Per Image for Object Detection [67.64774488115299]
We propose the Single Instance annotated Object Detection (SIOD), requiring only one instance annotation for each existing category in an image. Degraded from inter-task (WSOD) or inter-image (SSOD) discrepancies to the intra-image discrepancy, SIOD provides more reliable and rich prior knowledge for mining the rest of unlabeled instances. Under the SIOD setting, we propose a simple yet effective framework, termed Dual-Mining (DMiner), which consists of a Similarity-based Pseudo Label Generating module (SPLG) and a Pixel-level Group Contrastive Learning module (PGCL)
arXiv Detail & Related papers (2022-03-29T08:49:51Z)
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining [76.95808270536318]
We propose an end-to-end system that learns to separate proposals into labeled and unlabeled regions using Pseudo-positive mining. While the labeled regions are processed as usual, self-supervised learning is used to process the unlabeled regions. We conduct exhaustive experiments on five splits on the PASCAL-VOC and COCO datasets achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-01-12T18:57:04Z)
Learning to Detect Instance-level Salient Objects Using Complementary Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
Learning to segment from misaligned and partial labels [0.0]
Many non-urban settings lack the ground-truth needed for accurate segmentation. Open source infrastructure annotations like OpenStreetMaps (OSM) are representative of this issue. We present a novel and generalizable two-stage framework that enables improved pixel-wise image segmentation given misaligned and missing annotations.
arXiv Detail & Related papers (2020-05-27T06:02:58Z)
Towards Bounding-Box Free Panoptic Segmentation [16.4548904544277]
We introduce a new Bounding-Box Free Network (BBFNet) for panoptic segmentation. BBFNet predicts coarse watershed levels and uses them to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects.
arXiv Detail & Related papers (2020-02-18T16:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.