Related papers: Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery

Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery

URL: http://arxiv.org/abs/2310.19109v2
Date: Tue, 7 Nov 2023 14:59:17 GMT
Title: Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
Authors: Huseyin Fuat Alsan, Taner Arsan
Abstract summary: This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data.
Score: 0.5439020425819
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.

Related papers

Is Visual in-Context Learning for Compositional Medical Tasks within Reach? [68.56630652862293]
In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks.<n>We introduce a novel method for training in-context learners using a synthetic compositional task generation engine.
arXiv Detail & Related papers (2025-07-01T15:32:23Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Image Classification with Deep Reinforcement Active Learning [28.924413229981827]
In many real-world scenarios, labeled data are scarce, and their hand-labeling is time, effort and cost demanding. Active learning is an alternative paradigm that mitigates the effort in hand-labeling data, and annotated by an expert. In this work, we devise an adaptive active learning method based on Markov Decision Process (MDP)
arXiv Detail & Related papers (2024-12-27T18:37:51Z)
AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration [0.0]
We will review the main concepts of artificial intelligence (AI), machine learning (ML), deep learning (DL), and hybrid models.<n>This article provides brief overview of learning algorithms, exploring their current state, applications and future direction.
arXiv Detail & Related papers (2024-10-11T18:39:25Z)
Unlearnable Algorithms for In-context Learning [36.895152458323764]
In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model. We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data. We propose a new holistic measure of unlearning cost which accounts for varying inference costs.
arXiv Detail & Related papers (2024-02-01T16:43:04Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning. We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only. We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z)
Statistical Measures For Defining Curriculum Scoring Function [5.328970912536596]
We show improvements in performance with convolutional and fully-connected neural networks on real image datasets. Motivated by our insights from implicit curriculum ordering, we introduce a simple curriculum learning strategy. We also propose and study the performance of a dynamic curriculum learning algorithm.
arXiv Detail & Related papers (2021-02-27T07:25:49Z)
Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning. We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria. We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z)
Video Understanding as Machine Translation [53.59298393079866]
We tackle a wide variety of downstream video understanding tasks by means of a single unified framework. We report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT)
arXiv Detail & Related papers (2020-06-12T14:07:04Z)
Reducing Overlearning through Disentangled Representations by Suppressing Unknown Tasks [8.517620051440005]
Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand. From a privacy preservation perspective, the input visual information is not protected from the model. We propose a model-agnostic solution for reducing model overlearning by suppressing all the unknown tasks.
arXiv Detail & Related papers (2020-05-20T17:31:44Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.