vHeat: Building Vision Models upon Heat Conduction
- URL: http://arxiv.org/abs/2405.16555v1
- Date: Sun, 26 May 2024 12:58:04 GMT
- Title: vHeat: Building Vision Models upon Heat Conduction
- Authors: Zhaozhi Wang, Yue Liu, Yunfan Liu, Hongtian Yu, Yaowei Wang, Qixiang Ye, Yunjie Tian,
- Abstract summary: vHeat is a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field.
The essential idea is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy.
- Score: 63.00030330898876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image. In this study, we propose vHeat, a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field. The essential idea, inspired by the physical principle of heat conduction, is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy. This mechanism is incorporated into deep models through the newly proposed module, the Heat Conduction Operator (HCO), which is physically plausible and can be efficiently implemented using DCT and IDCT operations with a complexity of $\mathcal{O}(N^{1.5})$. Extensive experiments demonstrate that vHeat surpasses Vision Transformers (ViTs) across various vision tasks, while also providing higher inference speeds, reduced FLOPs, and lower GPU memory usage for high-resolution images. The code will be released at https://github.com/MzeroMiko/vHeat.
Related papers
- HcNet: Image Modeling with Heat Conduction Equation [6.582336726258388]
This paper aims to integrate the overall architectural design of the model into the heat conduction theory framework.
Our Heat Conduction Network (HcNet) still shows competitive performance.
arXiv Detail & Related papers (2024-08-12T02:48:00Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model [51.10876815815515]
We propose a new generic vision backbone with bidirectional Mamba blocks (Vim)
Vim marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models.
The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images.
arXiv Detail & Related papers (2024-01-17T18:56:18Z) - Multi-Context Dual Hyper-Prior Neural Image Compression [10.349258638494137]
We propose a Transformer-based nonlinear transform to efficiently capture both local and global information from the input image.
We also introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation.
Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance.
arXiv Detail & Related papers (2023-09-19T17:44:44Z) - Deep convolutional surrogates and degrees of freedom in thermal design [0.0]
Convolutional Neural Networks (CNNs) are used to predict results of Computational Fluid Dynamics (CFD) directly from topologies saved as images.
We present surrogate models for heat transfer and pressure drop prediction of complex fin geometries generated using composite Bezier curves.
arXiv Detail & Related papers (2022-08-16T00:45:39Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z) - Efficient Vision Transformers via Fine-Grained Manifold Distillation [96.50513363752836]
Vision transformer architectures have shown extraordinary performance on many computer vision tasks.
Although the network performance is boosted, transformers are often required more computational resources.
We propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches.
arXiv Detail & Related papers (2021-07-03T08:28:34Z) - TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning [87.38675639186405]
We propose a novel graph neural network approach, called TCL, which deals with the dynamically-evolving graph in a continuous-time fashion.
To the best of our knowledge, this is the first attempt to apply contrastive learning to representation learning on dynamic graphs.
arXiv Detail & Related papers (2021-05-17T15:33:25Z) - Learning Accurate Entropy Model with Global Reference for Image
Compression [22.171750277528222]
We propose a novel Global Reference Model for image compression to leverage both the local and the global context information.
A by-product of this work is the innovation of a mean-shifting GDN module that further improves the performance.
arXiv Detail & Related papers (2020-10-16T11:27:46Z) - Efficient and Model-Based Infrared and Visible Image Fusion Via
Algorithm Unrolling [24.83209572888164]
Infrared and visible image fusion (IVIF) expects to obtain images that retain thermal radiation information from infrared images and texture details from visible images.
A model-based convolutional neural network (CNN) model is proposed to overcome the shortcomings of traditional CNN-based IVIF models.
arXiv Detail & Related papers (2020-05-12T16:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.