Related papers: Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

URL: http://arxiv.org/abs/2403.18360v3
Date: Fri, 26 Apr 2024 15:46:05 GMT
Title: Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Authors: Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen, Hae-Gon Jeon, Tae Jong Choi,
Abstract summary: Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs) We design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB) ECB learns CNN on ViT to combine their distinct strengths.
Score: 13.753795233064695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs). They align the distribution differences between domains as encoders without considering their unique characteristics. For instance, ViT excels in accuracy due to its superior ability to capture global representations, while CNN has an advantage in capturing local representations. This fact has led us to design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB). ECB learns CNN on ViT to combine their distinct strengths. In particular, we leverage ViT's properties to explicitly find class-specific decision boundaries by maximizing the discrepancy between the outputs of the two classifiers to detect target samples far from the source support. In contrast, the CNN encoder clusters target features based on the previously defined class-specific boundaries by minimizing the discrepancy between the probabilities of the two classifiers. Finally, ViT and CNN mutually exchange knowledge to improve the quality of pseudo labels and reduce the knowledge discrepancies of these models. Compared to conventional DA methods, our ECB achieves superior performance, which verifies its effectiveness in this hybrid model. The project website can be found https://dotrannhattuong.github.io/ECB/website.

Related papers

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Medical Image Classification [5.904095466127043]
We introduce an interpretable-by-design hybrid fully convolutional CNN-Transformer architecture for medical image classification. Our model achieves state-of-the-art predictive performance compared to both black-box and interpretable models.
arXiv Detail & Related papers (2025-04-11T12:15:22Z)
CSHNet: A Novel Information Asymmetric Image Translation Method [57.22010952287759]
We propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES) CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets.
arXiv Detail & Related papers (2025-01-17T13:44:54Z)
CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z)
PICNN: A Pathway towards Interpretable Convolutional Neural Networks [12.31424771480963]
We introduce a novel pathway to alleviate the entanglement between filters and image classes. We use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix. We evaluate the effectiveness of our method on ten widely used network architectures.
arXiv Detail & Related papers (2023-12-19T11:36:03Z)
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models [0.0]
Binarization can be used to help reduce the size of ViT models and their computational cost significantly. ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods. We propose BinaryViT, in which inspired by the CNN architecture, we include operations from the CNN architecture into a pure ViT architecture.
arXiv Detail & Related papers (2023-06-29T04:48:02Z)
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets [91.25055890980084]
There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets. We propose Dynamic Hybrid Vision Transformer (DHVT) as the solution to enhance the two inductive biases. Our DHVT achieves a series of state-of-the-art performance with a lightweight model, 85.68% on CIFAR-100 with 22.8M parameters, 82.3% on ImageNet-1K with 24.0M parameters.
arXiv Detail & Related papers (2022-10-12T06:54:39Z)
Attention Mechanism Meets with Hybrid Dense Network for Hyperspectral Image Classification [6.946336514955953]
Convolutional Neural Networks (CNN) are more suitable, indeed. fixed kernel sizes make traditional CNN too specific, neither flexible nor conducive to feature learning, thus impacting on the classification accuracy. The proposed solution aims at combining the core idea of 3D and 2D Inception net with the Attention mechanism to boost the HSIC CNN performance in a hybrid scenario. The resulting textitattention-fused hybrid network (AfNet) is based on three attention-fused parallel hybrid sub-nets with different kernels in each block repeatedly using high-level features to enhance the final ground-truth maps.
arXiv Detail & Related papers (2022-01-04T06:30:24Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
A Systematic Evaluation: Fine-Grained CNN vs. Traditional CNN Classifiers [54.996358399108566]
We investigate the performance of the landmark general CNN classifiers, which presented top-notch results on large scale classification datasets. We compare it against state-of-the-art fine-grained classifiers. We show an extensive evaluation on six datasets to determine whether the fine-grained classifier is able to elevate the baseline in their experiments.
arXiv Detail & Related papers (2020-03-24T23:49:14Z)
On the Texture Bias for Few-Shot CNN Segmentation [21.349705243254423]
Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks. Recent evidence suggests texture bias in CNNs provides higher performing models when learning on large labeled training datasets. We propose a novel architecture that integrates a set of Difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space.
arXiv Detail & Related papers (2020-03-09T11:55:47Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.