HaarNet: Large-scale Linear-Morphological Hybrid Network for RGB-D
Semantic Segmentation
- URL: http://arxiv.org/abs/2310.07669v1
- Date: Wed, 11 Oct 2023 17:18:15 GMT
- Title: HaarNet: Large-scale Linear-Morphological Hybrid Network for RGB-D
Semantic Segmentation
- Authors: Rick Groenendijk, Leo Dorst, Theo Gevers
- Abstract summary: This is the first large-scale linear-morphological hybrid evaluated on a set of sizeable real-world datasets.
HaarNet is competitive with a state-of-the-art CNN, implying that morphological networks are a promising research direction for geometry-based learning tasks.
- Score: 12.89384111017003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Signals from different modalities each have their own combination algebra
which affects their sampling processing. RGB is mostly linear; depth is a
geometric signal following the operations of mathematical morphology. If a
network obtaining RGB-D input has both kinds of operators available in its
layers, it should be able to give effective output with fewer parameters. In
this paper, morphological elements in conjunction with more familiar linear
modules are used to construct a mixed linear-morphological network called
HaarNet. This is the first large-scale linear-morphological hybrid, evaluated
on a set of sizeable real-world datasets. In the network, morphological Haar
sampling is applied to both feature channels in several layers, which splits
extreme values and high-frequency information such that both can be processed
to improve both modalities. Moreover, morphologically parameterised ReLU is
used, and morphologically-sound up-sampling is applied to obtain a
full-resolution output. Experiments show that HaarNet is competitive with a
state-of-the-art CNN, implying that morphological networks are a promising
research direction for geometry-based learning tasks.
Related papers
- Combinatorial Regularity for Relatively Perfect Discrete Morse Gradient Vector Fields of ReLU Neural Networks [0.0]
ReLU neural networks induce a piecewise linear decomposition of their input space called the canonical polyhedral complex.
It has previously been established that it is decidable whether a ReLU neural network is piecewise linear Morse.
arXiv Detail & Related papers (2024-12-23T21:58:51Z) - Deep Learning as Ricci Flow [38.27936710747996]
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data.
We show that the transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow.
Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
arXiv Detail & Related papers (2024-04-22T15:12:47Z) - On Characterizing the Evolution of Embedding Space of Neural Networks
using Algebraic Topology [9.537910170141467]
We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers.
We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value.
arXiv Detail & Related papers (2023-11-08T10:45:12Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - TANet: Transformer-based Asymmetric Network for RGB-D Salient Object
Detection [13.126051625000605]
RGB-D SOD methods mainly rely on a symmetric two-stream CNN-based network to extract RGB and depth channel features separately.
We propose a Transformer-based asymmetric network (TANet) to tackle the issues mentioned above.
Our method achieves superior performance over 14 state-of-the-art RGB-D methods on six public datasets.
arXiv Detail & Related papers (2022-07-04T03:06:59Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.