UFO: Unified Feature Optimization
- URL: http://arxiv.org/abs/2207.10341v1
- Date: Thu, 21 Jul 2022 07:34:06 GMT
- Title: UFO: Unified Feature Optimization
- Authors: Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang,
Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han,
Jingtuo Liu, Errui Ding and Jingdong Wang
- Abstract summary: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models.
UFO aims to benefit each single task with a large-scale pretraining on all tasks.
UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining.
- Score: 67.77936811483664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for
training and deploying deep models under real-world and large-scale scenarios,
which requires a collection of multiple AI functions. UFO aims to benefit each
single task with a large-scale pretraining on all tasks. Compared with the well
known foundation model, UFO has two different points of emphasis, i.e.,
relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide
range of tasks into a moderate-sized unified model in a multi-task learning
manner and further trims the model size when transferred to down-stream tasks.
2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the
trimmed model dedicated for one or more already-seen task. With these two
characteristics, UFO provides great convenience for flexible deployment, while
maintaining the benefits of large-scale pretraining. A key merit of UFO is that
the trimming process not only reduces the model size and inference consumption,
but also even improves the accuracy on certain tasks. Specifically, UFO
considers the multi-task training and brings two-fold impact on the unified
model: some closely related tasks have mutual benefits, while some tasks have
conflicts against each other. UFO manages to reduce the conflicts and to
preserve the mutual benefits through a novel Network Architecture Search (NAS)
method. Experiments on a wide range of deep representation learning tasks
(i.e., face recognition, person re-identification, vehicle re-identification
and product retrieval) show that the model trimmed from UFO achieves higher
accuracy than its single-task-trained counterpart and yet has smaller model
size, validating the concept of UFO. Besides, UFO also supported the release of
17 billion parameters computer vision (CV) foundation model which is the
largest CV model in the industry.
Related papers
- RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning [12.442430013205131]
This paper proposes the RingMo-Aerial model, aiming to fill the gap in foundation model research in the field of ARS vision.
The model's detection capability for small targets is enhanced and optimized for the tilted viewing angles characteristic of ARS.
Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks.
arXiv Detail & Related papers (2024-09-20T10:03:14Z) - SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients [0.8873228457453465]
Small object detection in aerial imagery presents significant challenges in computer vision.
Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases.
This paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects.
arXiv Detail & Related papers (2024-05-02T19:47:08Z) - UFO: Unidentified Foreground Object Detection in 3D Point Cloud [7.286344230797102]
Existing 3D object detectors encounter hard challenges in both 3D localization and Out-of-Distribution detection.
We suggest a new UFO detection framework including three tasks: evaluation protocol, methodology, and benchmark.
The proposed framework consistently enhances performance by a large margin across all four baseline detectors.
arXiv Detail & Related papers (2024-01-08T12:16:06Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
Tasks [129.49630356651454]
We propose a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL)
Our FAME-ViL can save 61.5% of parameters over alternatives, while significantly outperforming the conventional independently trained single-task models.
arXiv Detail & Related papers (2023-03-04T19:07:48Z) - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks.
Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model.
Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - UFO-ViT: High Performance Linear Vision Transformer without Softmax [0.0]
We propose the UFO-ViT(Unit Force Operated Vision Trnasformer), novel method to reduce the computations of self-attention by eliminating some non-linearity.
Model achieves most transformer-based models on image classification and dense prediction tasks through most capacity regime.
arXiv Detail & Related papers (2021-09-29T12:32:49Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.