Related papers: UFO: Unified Feature Optimization

UFO: Unified Feature Optimization

URL: http://arxiv.org/abs/2207.10341v1
Date: Thu, 21 Jul 2022 07:34:06 GMT
Title: UFO: Unified Feature Optimization
Authors: Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang, Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding and Jingdong Wang
Abstract summary: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models. UFO aims to benefit each single task with a large-scale pretraining on all tasks. UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining.
Score: 67.77936811483664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. UFO aims to benefit each single task with a large-scale pretraining on all tasks. Compared with the well known foundation model, UFO has two different points of emphasis, i.e., relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide range of tasks into a moderate-sized unified model in a multi-task learning manner and further trims the model size when transferred to down-stream tasks. 2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the trimmed model dedicated for one or more already-seen task. With these two characteristics, UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining. A key merit of UFO is that the trimming process not only reduces the model size and inference consumption, but also even improves the accuracy on certain tasks. Specifically, UFO considers the multi-task training and brings two-fold impact on the unified model: some closely related tasks have mutual benefits, while some tasks have conflicts against each other. UFO manages to reduce the conflicts and to preserve the mutual benefits through a novel Network Architecture Search (NAS) method. Experiments on a wide range of deep representation learning tasks (i.e., face recognition, person re-identification, vehicle re-identification and product retrieval) show that the model trimmed from UFO achieves higher accuracy than its single-task-trained counterpart and yet has smaller model size, validating the concept of UFO. Besides, UFO also supported the release of 17 billion parameters computer vision (CV) foundation model which is the largest CV model in the industry.

Related papers

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning. Our approach only leverages a small number of samples to search for the desired pruning policy. We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images [2.9138705529771123]
We present a novel enhancement to the YOLOv8 model, tailored for oriented object detection tasks. Our model features a wavelet transform-based C2f module for capturing associative features and an Adaptive Scale Feature Pyramid (ASFP) module that leverages P2 layer details. Our approach provides a more efficient architectural design than DecoupleNet, which has 23.3M parameters, all while maintaining detection accuracy.
arXiv Detail & Related papers (2024-12-17T05:45:48Z)
UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer [20.121885706650758]
We propose a non-invasive plug-in called Uniform Frame Organizer (UFO) UFO is compatible with any diffusion-based video generation model. The training for UFO is simple, efficient, requires minimal resources, and supports stylized training.
arXiv Detail & Related papers (2024-12-12T15:56:26Z)
Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models. ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules. Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z)
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning [12.442430013205131]
This paper proposes the RingMo-Aerial model, aiming to fill the gap in foundation model research in the field of ARS vision. The model's detection capability for small targets is enhanced and optimized for the tilted viewing angles characteristic of ARS. Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks.
arXiv Detail & Related papers (2024-09-20T10:03:14Z)
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients [0.8873228457453465]
Small object detection in aerial imagery presents significant challenges in computer vision. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases. This paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects.
arXiv Detail & Related papers (2024-05-02T19:47:08Z)
UFO: Unidentified Foreground Object Detection in 3D Point Cloud [7.286344230797102]
Existing 3D object detectors encounter hard challenges in both 3D localization and Out-of-Distribution detection. We suggest a new UFO detection framework including three tasks: evaluation protocol, methodology, and benchmark. The proposed framework consistently enhances performance by a large margin across all four baseline detectors.
arXiv Detail & Related papers (2024-01-08T12:16:06Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks [129.49630356651454]
We propose a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL) Our FAME-ViL can save 61.5% of parameters over alternatives, while significantly outperforming the conventional independently trained single-task models.
arXiv Detail & Related papers (2023-03-04T19:07:48Z)
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks. Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model. Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z)
Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID. By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z)
UFO-ViT: High Performance Linear Vision Transformer without Softmax [0.0]
We propose the UFO-ViT(Unit Force Operated Vision Trnasformer), novel method to reduce the computations of self-attention by eliminating some non-linearity. Model achieves most transformer-based models on image classification and dense prediction tasks through most capacity regime.
arXiv Detail & Related papers (2021-09-29T12:32:49Z)
Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images. To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN. In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z)
A Unified Object Motion and Affinity Model for Online Multi-Object Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA. UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning. We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.