Related papers: Cross-architecture universal feature coding via distribution alignment

Cross-architecture universal feature coding via distribution alignment

URL: http://arxiv.org/abs/2506.12737v1
Date: Sun, 15 Jun 2025 06:14:02 GMT
Title: Cross-architecture universal feature coding via distribution alignment
Authors: Changsheng Gao, Shan Liu, Feng Wu, Weisi Lin,
Abstract summary: We introduce a new research problem: cross-architecture universal feature coding (CAUFC)<n>We propose a two-step distribution alignment method. First, we design the format alignment method that CNN and Transformer features into a consistent 2D token format. Second, we propose the feature value alignment method that harmonizes statistical distributions via truncation and normalization.<n>As a first attempt to study CAUFC, we evaluate our method on the image classification task. Experimental results demonstrate that our method achieves superior rate-accuracy trade-offs compared to the architecture-specific baseline.
Score: 88.73189953617594
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Feature coding has become increasingly important in scenarios where semantic representations rather than raw pixels are transmitted and stored. However, most existing methods are architecture-specific, targeting either CNNs or Transformers. This design limits their applicability in real-world scenarios where features from both architectures coexist. To address this gap, we introduce a new research problem: cross-architecture universal feature coding (CAUFC), which seeks to build a unified codec that can effectively compress features from heterogeneous architectures. To tackle this challenge, we propose a two-step distribution alignment method. First, we design the format alignment method that unifies CNN and Transformer features into a consistent 2D token format. Second, we propose the feature value alignment method that harmonizes statistical distributions via truncation and normalization. As a first attempt to study CAUFC, we evaluate our method on the image classification task. Experimental results demonstrate that our method achieves superior rate-accuracy trade-offs compared to the architecture-specific baseline. This work marks an initial step toward universal feature compression across heterogeneous model architectures.

Related papers

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval [54.15776146365823]
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text.<n>We propose CSMCIR, a unified representation framework that achieves efficient query-target alignment through three synergistic components.
arXiv Detail & Related papers (2026-01-07T09:21:38Z)
IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization [38.16787019088681]
This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels.<n>We build on the concept of tensor flow divergence, which quantifies how information is transformed across network layers.
arXiv Detail & Related papers (2025-11-25T10:02:21Z)
Divide, Conquer and Unite: Hierarchical Style-Recalibrated Prototype Alignment for Federated Medical Image Segmentation [66.82598255715696]
Federated learning enables multiple medical institutions to train a global model without sharing data.<n>Current approaches primarily focus on final-layer features, overlooking critical multi-level cues.<n>We propose FedBCS to bridge feature representation gaps via domain-invariant contextual prototypes alignment.
arXiv Detail & Related papers (2025-11-14T04:15:34Z)
Prototype-Driven Structure Synergy Network for Remote Sensing Images Segmentation [36.11399269900789]
This paper proposes a Prototype-Driven Structure Synergy Network (PDSSNet)<n>The design of PDSSNet is based on a core concept, a complete ground object is jointly defined by its invariant class semantics and its variant spatial structure.<n>Extensive experiments demonstrate that PDSSNet outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T02:29:40Z)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.<n>We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.<n>We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z)
Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation Network [18.47001817385548]
We propose a parallel inference network customized for semantic segmentation tasks. We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy. Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets.
arXiv Detail & Related papers (2024-02-03T22:51:17Z)
Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching. ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z)
Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection [0.0]
Reconstruction autoencoder-based methods deal with the problem by using input reconstruction error as a metric of novelty vs. normality. We introduce semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods. Our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
arXiv Detail & Related papers (2022-03-04T09:04:55Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains. Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.