Related papers: A Transformer Framework for Data Fusion and Multi-Task Learning in Smart Cities

A Transformer Framework for Data Fusion and Multi-Task Learning in Smart Cities

URL: http://arxiv.org/abs/2211.10506v1
Date: Fri, 18 Nov 2022 20:43:09 GMT
Title: A Transformer Framework for Data Fusion and Multi-Task Learning in Smart Cities
Authors: Alexander C. DeRieux, Walid Saad, Wangda Zuo, Rachmawan Budiarto, Mochamad Donny Koerniawan, and Dwi Novitasari
Abstract summary: This paper proposes a Transformer-based AI system for emerging smart cities. It supports virtually any input data and output task types present S&CCs. It is demonstrated through learning diverse task sets representative of S&CC environments.
Score: 99.56635097352628
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rapid global urbanization is a double-edged sword, heralding promises of economical prosperity and public health while also posing unique environmental and humanitarian challenges. Smart and connected communities (S&CCs) apply data-centric solutions to these problems by integrating artificial intelligence (AI) and the Internet of Things (IoT). This coupling of intelligent technologies also poses interesting system design challenges regarding heterogeneous data fusion and task diversity. Transformers are of particular interest to address these problems, given their success across diverse fields of natural language processing (NLP), computer vision, time-series regression, and multi-modal data fusion. This begs the question whether Transformers can be further diversified to leverage fusions of IoT data sources for heterogeneous multi-task learning in S&CC trade spaces. In this paper, a Transformer-based AI system for emerging smart cities is proposed. Designed using a pure encoder backbone, and further customized through interchangeable input embedding and output task heads, the system supports virtually any input data and output task types present S&CCs. This generalizability is demonstrated through learning diverse task sets representative of S&CC environments, including multivariate time-series regression, visual plant disease classification, and image-time-series fusion tasks using a combination of Beijing PM2.5 and Plant Village datasets. Simulation results show that the proposed Transformer-based system can handle various input data types via custom sequence embedding techniques, and are naturally suited to learning a diverse set of tasks. The results also show that multi-task learners increase both memory and computational efficiency while maintaining comparable performance to both single-task variants, and non-Transformer baselines.

Related papers

Is Diversity All You Need for Scalable Robotic Manipulation? [50.747150672933316]
We investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better"<n>We show that task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios.<n>We propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data.
arXiv Detail & Related papers (2025-07-08T17:52:44Z)
How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT? [4.148953499574201]
We propose a novel model, SpatialNet-ViT, leveraging the power of Vision Transformers (ViTs) and Multi-Task Learning (MTL)<n>This integrated approach combines spatial awareness with contextual understanding, improving both classification accuracy and scalability.
arXiv Detail & Related papers (2025-06-25T10:50:33Z)
Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning [11.697279328699489]
We propose a Multimodal Fused Learning framework to solve the Generalized Traveling Salesman Problem (GTSP)<n>We first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations.<n>We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module.
arXiv Detail & Related papers (2025-06-20T11:51:52Z)
LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework. We introduce key innovations to optimize generative performance for vision tasks. The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z)
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z)
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts. We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z)
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL) We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding [11.608682595506354]
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions. We propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-task interaction among spatial features of different tasks in a global context.
arXiv Detail & Related papers (2023-06-08T00:28:22Z)
RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning [11.329273673732217]
Federated learning is an effective way to enable AI over massive distributed nodes with security. It is challenging to ensure the privacy while maintain a coupled multi-task learning across multiple base stations (BSs) and terminals. In this paper, inspired by the natural cloud-BS-terminal hierarchy of cellular works, we provide a viable resource-aware hierarchical federated MTL (RHFedMTL) solution.
arXiv Detail & Related papers (2023-06-01T13:49:55Z)
CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning [33.88636835443266]
We propose a framework to better scale up robot learning under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage.
arXiv Detail & Related papers (2022-12-12T05:30:08Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.