Line Segment Detection Using Transformers without Edges
- URL: http://arxiv.org/abs/2101.01909v1
- Date: Wed, 6 Jan 2021 08:00:18 GMT
- Title: Line Segment Detection Using Transformers without Edges
- Authors: Yifan Xu, Weijian Xu, David Cheung and Zhuowen Tu
- Abstract summary: Our method, named LinE segment TRansformers (LETR), tackles the three main problems in this domain.
We show state-of-the-art results on Wireframe and YorkUrban benchmarks.
- Score: 22.834316796018705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a holistically end-to-end algorithm for line
segment detection with transformers that is post-processing and
heuristics-guided intermediate processing (edge/junction/region detection)
free. Our method, named LinE segment TRansformers (LETR), tackles the three
main problems in this domain, namely edge element detection, perceptual
grouping, and holistic inference by three highlights in detection transformers
(DETR) including tokenized queries with integrated encoding and decoding,
self-attention, and joint queries respectively. The transformers learn to
progressively refine line segments through layers of self-attention mechanism
skipping the heuristic design in the previous line segmentation algorithms. We
equip multi-scale encoder/decoder in the transformers to perform fine-grained
line segment detection under a direct end-point distance loss that is
particularly suitable for entities such as line segments that are not
conveniently represented by bounding boxes. In the experiments, we show
state-of-the-art results on Wireframe and YorkUrban benchmarks. LETR points to
a promising direction for joint end-to-end detection of general entities beyond
the standard object bounding box representation.
Related papers
- SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding.
This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z) - SegT: A Novel Separated Edge-guidance Transformer Network for Polyp
Segmentation [10.144870911523622]
We propose a novel separated edge-guidance transformer (SegT) network that aims to build an effective polyp segmentation model.
A transformer encoder that learns a more robust representation than existing CNN-based approaches was specifically applied.
To evaluate the effectiveness of SegT, we conducted experiments with five challenging public datasets.
arXiv Detail & Related papers (2023-06-19T08:32:05Z) - Cross-domain Detection Transformer based on Spatial-aware and
Semantic-aware Token Alignment [31.759205815348658]
We propose a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for cross-domain detection transformers.
For spatial-aware token alignment, we can extract the information from the cross-attention map (CAM) to align the distribution of tokens according to their attention to object queries.
For semantic-aware token alignment, we inject the category information into the cross-attention map and construct domain embedding to guide the learning of a multi-class discriminator.
arXiv Detail & Related papers (2022-06-01T04:13:22Z) - EDTER: Edge Detection with Transformer [71.83960813880843]
We propose a novel transformer-based edge detector, emphEdge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges.
EDTER exploits the full image context information and detailed local cues simultaneously.
Experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
arXiv Detail & Related papers (2022-03-16T11:55:55Z) - Temporal Perceiver: A General Architecture for Arbitrary Boundary
Detection [48.33132632418303]
Generic Boundary Detection (GBD) aims at locating general boundaries that divide videos into semantically coherent and taxonomy-free units.
Previous research separately handle these different-level generic boundaries with specific designs of complicated deep networks from simple CNN to LSTM.
We present Temporal Perceiver, a general architecture with Transformers, offering a unified solution to the detection of arbitrary generic boundaries.
arXiv Detail & Related papers (2022-03-01T09:31:30Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - ELSD: Efficient Line Segment Detector and Descriptor [9.64386089593887]
We present the novel Efficient Line Segment Detector and Descriptor (ELSD) to simultaneously detect line segments and extract their descriptors in an image.
ELSD provides the essential line features to the higher-level tasks like SLAM and image matching in real time.
In the experiments, the proposed ELSD achieves the state-of-the-art performance on the Wireframe dataset and YorkUrban dataset.
arXiv Detail & Related papers (2021-04-29T08:53:03Z) - SOLD2: Self-supervised Occlusion-aware Line Description and Detection [95.8719432775724]
We introduce the first joint detection and description of line segments in a single deep network.
Our method does not require any annotated line labels and can therefore generalize to any dataset.
We evaluate our approach against previous line detection and description methods on several multi-view datasets.
arXiv Detail & Related papers (2021-04-07T19:27:17Z) - Deep Hough Transform for Semantic Line Detection [70.28969017874587]
We focus on a fundamental task of detecting meaningful line structures, a.k.a. semantic lines, in natural scenes.
Previous methods neglect the inherent characteristics of lines, leading to sub-optimal performance.
We propose a one-shot end-to-end learning framework for line detection.
arXiv Detail & Related papers (2020-03-10T13:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.