One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and
Inter-Image Attention Design
- URL: http://arxiv.org/abs/2211.06276v1
- Date: Fri, 11 Nov 2022 15:33:21 GMT
- Title: One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and
Inter-Image Attention Design
- Authors: Yikai Yan, Chaoyue Niu, Fan Wu, Qinya Li, Shaojie Tang, Chengfei Lyu,
Guihai Chen
- Abstract summary: We propose a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models.
In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images.
We evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets.
- Score: 40.97593636235116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mainstream workflow of image recognition applications is first training
one global model on the cloud for a wide range of classes and then serving
numerous clients, each with heterogeneous images from a small subset of classes
to be recognized. From the cloud-client discrepancies on the range of image
classes, the recognition model is desired to have strong adaptiveness,
intuitively by concentrating the focus on each individual client's local
dynamic class subset, while incurring negligible overhead. In this work, we
propose to plug a new intra-client and inter-image attention (ICIIA) module
into existing backbone recognition models, requiring only one-time cloud-based
training to be client-adaptive. In particular, given a target image from a
certain client, ICIIA introduces multi-head self-attention to retrieve relevant
images from the client's historical unlabeled images, thereby calibrating the
focus and the recognition result. Further considering that ICIIA's overhead is
dominated by linear projection, we propose partitioned linear projection with
feature shuffling for replacement and allow increasing the number of partitions
to dramatically improve efficiency without scarifying too much accuracy. We
finally evaluate ICIIA using 3 different recognition tasks with 9 backbone
models over 5 representative datasets. Extensive evaluation results demonstrate
the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with
the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing
accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and
0.02% of FLOPs, respectively.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Scalable Federated Learning for Clients with Different Input Image Sizes
and Numbers of Output Categories [34.22635158366194]
Federated learning is a privacy-preserving training method which consists of training from a plurality of clients but without sharing their confidential data.
We propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories.
arXiv Detail & Related papers (2023-11-15T05:43:14Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for
Interactive Image Segmentation [2.482735440750151]
We propose a click-based and mask-guided interactive image segmentation framework containing three novel components.
The proposed framework offers a unified inference framework to generate segmentation results in a coarse-to-fine manner.
Our model reduces by 33.2%, and 15.5% the number of clicks required to surpass an IoU of 0.95 in the previous state-of-the-art approach.
arXiv Detail & Related papers (2023-03-09T23:20:35Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy.
We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server.
We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.