UFO: Unified Feature Optimization
- URL: http://arxiv.org/abs/2207.10341v1
- Date: Thu, 21 Jul 2022 07:34:06 GMT
- Title: UFO: Unified Feature Optimization
- Authors: Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang,
Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han,
Jingtuo Liu, Errui Ding and Jingdong Wang
- Abstract summary: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models.
UFO aims to benefit each single task with a large-scale pretraining on all tasks.
UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining.
- Score: 67.77936811483664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for
training and deploying deep models under real-world and large-scale scenarios,
which requires a collection of multiple AI functions. UFO aims to benefit each
single task with a large-scale pretraining on all tasks. Compared with the well
known foundation model, UFO has two different points of emphasis, i.e.,
relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide
range of tasks into a moderate-sized unified model in a multi-task learning
manner and further trims the model size when transferred to down-stream tasks.
2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the
trimmed model dedicated for one or more already-seen task. With these two
characteristics, UFO provides great convenience for flexible deployment, while
maintaining the benefits of large-scale pretraining. A key merit of UFO is that
the trimming process not only reduces the model size and inference consumption,
but also even improves the accuracy on certain tasks. Specifically, UFO
considers the multi-task training and brings two-fold impact on the unified
model: some closely related tasks have mutual benefits, while some tasks have
conflicts against each other. UFO manages to reduce the conflicts and to
preserve the mutual benefits through a novel Network Architecture Search (NAS)
method. Experiments on a wide range of deep representation learning tasks
(i.e., face recognition, person re-identification, vehicle re-identification
and product retrieval) show that the model trimmed from UFO achieves higher
accuracy than its single-task-trained counterpart and yet has smaller model
size, validating the concept of UFO. Besides, UFO also supported the release of
17 billion parameters computer vision (CV) foundation model which is the
largest CV model in the industry.
Related papers
- Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images [2.9138705529771123]
We present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks.
Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details.
Our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy.
arXiv Detail & Related papers (2024-12-17T05:45:48Z) - UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer [20.121885706650758]
We propose a non-invasive plug-in called Uniform Frame Organizer (UFO)
UFO is compatible with any diffusion-based video generation model.
The training for UFO is simple, efficient, requires minimal resources, and supports stylized training.
arXiv Detail & Related papers (2024-12-12T15:56:26Z) - Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models.
ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules.
Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z) - UFO: Unidentified Foreground Object Detection in 3D Point Cloud [7.286344230797102]
Existing 3D object detectors encounter hard challenges in both 3D localization and Out-of-Distribution detection.
We suggest a new UFO detection framework including three tasks: evaluation protocol, methodology, and benchmark.
The proposed framework consistently enhances performance by a large margin across all four baseline detectors.
arXiv Detail & Related papers (2024-01-08T12:16:06Z) - FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
Tasks [129.49630356651454]
We propose a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL)
Our FAME-ViL can save 61.5% of parameters over alternatives, while significantly outperforming the conventional independently trained single-task models.
arXiv Detail & Related papers (2023-03-04T19:07:48Z) - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks.
Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model.
Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - UFO-ViT: High Performance Linear Vision Transformer without Softmax [0.0]
We propose the UFO-ViT(Unit Force Operated Vision Trnasformer), novel method to reduce the computations of self-attention by eliminating some non-linearity.
Model achieves most transformer-based models on image classification and dense prediction tasks through most capacity regime.
arXiv Detail & Related papers (2021-09-29T12:32:49Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.