RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
- URL: http://arxiv.org/abs/2206.10066v1
- Date: Tue, 21 Jun 2022 01:23:11 GMT
- Title: RendNet: Unified 2D/3D Recognizer With Latent Space Rendering
- Authors: Ruoxi Shi, Xinyang Jiang, Caihua Shan, Yansen Wang, Dongsheng Li
- Abstract summary: We argue that the VG-to-RG rendering process is essential to effectively combine VG and RG information.
We propose RendNet, a unified architecture for recognition on both 2D and 3D scenarios.
- Score: 18.877203720641393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector graphics (VG) have been ubiquitous in our daily life with vast
applications in engineering, architecture, designs, etc. The VG recognition
process of most existing methods is to first render the VG into raster graphics
(RG) and then conduct recognition based on RG formats. However, this procedure
discards the structure of geometries and loses the high resolution of VG.
Recently, another category of algorithms is proposed to recognize directly from
the original VG format. But it is affected by the topological errors that can
be filtered out by RG rendering. Instead of looking at one format, it is a good
solution to utilize the formats of VG and RG together to avoid these
shortcomings. Besides, we argue that the VG-to-RG rendering process is
essential to effectively combine VG and RG information. By specifying the rules
on how to transfer VG primitives to RG pixels, the rendering process depicts
the interaction and correlation between VG and RG. As a result, we propose
RendNet, a unified architecture for recognition on both 2D and 3D scenarios,
which considers both VG/RG representations and exploits their interaction by
incorporating the VG-to-RG rasterization process. Experiments show that RendNet
can achieve state-of-the-art performance on 2D and 3D object recognition tasks
on various VG datasets.
Related papers
- Memorize What Matters: Emergent Scene Decomposition from Multitraverse [54.487589469432706]
We introduce 3D Gaussian Mapping, a camera-only offline mapping framework grounded in 3D Gaussian Splatting.
3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation.
We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering.
arXiv Detail & Related papers (2024-05-27T14:11:17Z) - DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines [67.44394651662738]
Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization.
Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself.
This paper proposes practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.
arXiv Detail & Related papers (2024-04-24T09:45:12Z) - Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks [0.11184789007828977]
Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks.
Recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance.
This study explores graph representations of ECG signals using Visibility Graph (VG) and Vector Visibility Graph (VVG)
arXiv Detail & Related papers (2024-04-19T13:24:09Z) - Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization [51.33923845954759]
3D Visual Grounding (3DVG) and 3D Captioning (3DDC) are two crucial tasks in various 3D applications.
We propose a unified framework, 3DGCTR, to jointly solve these two distinct but closely related tasks.
In terms of implementation, we integrate a Lightweight Caption Head into the existing 3DVG network with a Caption Text Prompt as a connection.
arXiv Detail & Related papers (2024-04-17T04:46:27Z) - PVG: Progressive Vision Graph for Vision Recognition [25.752613030302534]
We propose a Progressive Vision Graph (PVG) architecture for vision recognition task.
PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC), 2) Neighbor nodes information aggregation and update module, and 3) Graph error Linear Unit (GraphLU)
arXiv Detail & Related papers (2023-08-01T14:35:29Z) - Iterative Robust Visual Grounding with Masked Reference based
Centerpoint Supervision [24.90534567531536]
We propose an Iterative Robust Visual Grounding (IR-VG) framework with Masked Reference based Centerpoint Supervision (MRCS)
The proposed framework is evaluated on five regular VG datasets and two newly constructed robust VG datasets.
arXiv Detail & Related papers (2023-07-23T17:55:24Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Vision GNN: An Image is Worth Graph of Nodes [49.3335689216822]
We propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.
Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes.
Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture.
arXiv Detail & Related papers (2022-06-01T07:01:04Z) - UIGR: Unified Interactive Garment Retrieval [105.56179829647142]
Interactive garment retrieval (IGR) aims to retrieve a target garment image based on a reference garment image.
Two IGR tasks have been studied extensively: text-guided garment retrieval (TGR) and visually compatible garment retrieval (VCR)
We propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR.
arXiv Detail & Related papers (2022-04-06T21:54:14Z) - Adaptive Visibility Graph Neural Network and It's Application in
Modulation Classification [2.3228726690478547]
We propose an Adaptive Visibility Graph (AVG) algorithm that can adaptively map time series into graphs.
We then adopt AVGNet for radio signal modulation classification which is an important task in the field of wireless communication.
arXiv Detail & Related papers (2021-06-16T06:00:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.