Task-specific Scene Structure Representations
- URL: http://arxiv.org/abs/2301.00555v1
- Date: Mon, 2 Jan 2023 08:25:47 GMT
- Title: Task-specific Scene Structure Representations
- Authors: Jisu Shin, Seunghyun Shin and Hae-Gon Jeon
- Abstract summary: We propose a single general neural network architecture for extracting task-specific structure guidance for scenes.
Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications.
- Score: 13.775485887433815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the informative structures of scenes is essential for low-level
vision tasks. Unfortunately, it is difficult to obtain a concrete visual
definition of the informative structures because influences of visual features
are task-specific. In this paper, we propose a single general neural network
architecture for extracting task-specific structure guidance for scenes. To do
this, we first analyze traditional spectral clustering methods, which computes
a set of eigenvectors to model a segmented graph forming small compact
structures on image domains. We then unfold the traditional graph-partitioning
problem into a learnable network, named \textit{Scene Structure Guidance
Network (SSGNet)}, to represent the task-specific informative structures. The
SSGNet yields a set of coefficients of eigenvectors that produces explicit
feature representations of image structures. In addition, our SSGNet is
light-weight ($\sim$ 55K parameters), and can be used as a plug-and-play module
for off-the-shelf architectures. We optimize the SSGNet without any supervision
by proposing two novel training losses that enforce task-specific scene
structure generation during training. Our main contribution is to show that
such a simple network can achieve state-of-the-art results for several
low-level vision applications including joint upsampling and image denoising.
We also demonstrate that our SSGNet generalizes well on unseen datasets,
compared to existing methods which use structural embedding frameworks. Our
source codes are available at https://github.com/jsshin98/SSGNet.
Related papers
- Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing.
The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge.
It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z) - Node Classification via Semantic-Structural Attention-Enhanced Graph Convolutional Networks [0.9463895540925061]
We introduce the semantic-structural attention-enhanced graph convolutional network (SSA-GCN)
It not only models the graph structure but also extracts generalized unsupervised features to enhance classification performance.
Our experiments on the Cora and CiteSeer datasets demonstrate the performance improvements achieved by our proposed method.
arXiv Detail & Related papers (2024-03-24T06:28:54Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - Image as Set of Points [60.30495338399321]
Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm.
Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
arXiv Detail & Related papers (2023-03-02T18:56:39Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - HOSE-Net: Higher Order Structure Embedded Network for Scene Graph
Generation [20.148175528691905]
This paper presents a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space.
We also propose a hierarchical semantic aggregation(HSA) module to reduce the number of subspaces by introducing higher order structural information.
The proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.
arXiv Detail & Related papers (2020-08-12T07:58:13Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - Graph Structural-topic Neural Network [35.27112594356742]
Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes.
In this paper, we propose Graph Structural-topic Neural Network, abbreviated GraphSTONE, a GCN model that utilizes topic models of graphs.
We design multi-view GCNs to unify node features and structural topic features and utilize structural topics to guide the aggregation.
arXiv Detail & Related papers (2020-06-25T09:47:21Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.