A Transformer Framework for Data Fusion and Multi-Task Learning in Smart
Cities
- URL: http://arxiv.org/abs/2211.10506v1
- Date: Fri, 18 Nov 2022 20:43:09 GMT
- Title: A Transformer Framework for Data Fusion and Multi-Task Learning in Smart
Cities
- Authors: Alexander C. DeRieux, Walid Saad, Wangda Zuo, Rachmawan Budiarto,
Mochamad Donny Koerniawan, and Dwi Novitasari
- Abstract summary: This paper proposes a Transformer-based AI system for emerging smart cities.
It supports virtually any input data and output task types present S&CCs.
It is demonstrated through learning diverse task sets representative of S&CC environments.
- Score: 99.56635097352628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rapid global urbanization is a double-edged sword, heralding promises of
economical prosperity and public health while also posing unique environmental
and humanitarian challenges. Smart and connected communities (S&CCs) apply
data-centric solutions to these problems by integrating artificial intelligence
(AI) and the Internet of Things (IoT). This coupling of intelligent
technologies also poses interesting system design challenges regarding
heterogeneous data fusion and task diversity. Transformers are of particular
interest to address these problems, given their success across diverse fields
of natural language processing (NLP), computer vision, time-series regression,
and multi-modal data fusion. This begs the question whether Transformers can be
further diversified to leverage fusions of IoT data sources for heterogeneous
multi-task learning in S&CC trade spaces. In this paper, a Transformer-based AI
system for emerging smart cities is proposed. Designed using a pure encoder
backbone, and further customized through interchangeable input embedding and
output task heads, the system supports virtually any input data and output task
types present S&CCs. This generalizability is demonstrated through learning
diverse task sets representative of S&CC environments, including multivariate
time-series regression, visual plant disease classification, and
image-time-series fusion tasks using a combination of Beijing PM2.5 and Plant
Village datasets. Simulation results show that the proposed Transformer-based
system can handle various input data types via custom sequence embedding
techniques, and are naturally suited to learning a diverse set of tasks. The
results also show that multi-task learners increase both memory and
computational efficiency while maintaining comparable performance to both
single-task variants, and non-Transformer baselines.
Related papers
- LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework.
We introduce key innovations to optimize generative performance for vision tasks.
The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z) - Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene
Understanding [11.608682595506354]
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model.
Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions.
We propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-task interaction among spatial features of different tasks in a global context.
arXiv Detail & Related papers (2023-06-08T00:28:22Z) - RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning [11.329273673732217]
Federated learning is an effective way to enable AI over massive distributed nodes with security.
It is challenging to ensure the privacy while maintain a coupled multi-task learning across multiple base stations (BSs) and terminals.
In this paper, inspired by the natural cloud-BS-terminal hierarchy of cellular works, we provide a viable resource-aware hierarchical federated MTL (RHFedMTL) solution.
arXiv Detail & Related papers (2023-06-01T13:49:55Z) - CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation
Learning [33.88636835443266]
We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments.
Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training.
In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
arXiv Detail & Related papers (2022-12-12T05:30:08Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.