Ensemble Modeling for Multimodal Visual Action Recognition
- URL: http://arxiv.org/abs/2308.05430v2
- Date: Mon, 25 Sep 2023 08:34:07 GMT
- Title: Ensemble Modeling for Multimodal Visual Action Recognition
- Authors: Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah
- Abstract summary: We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
- Score: 50.38638300332429
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we propose an ensemble modeling approach for multimodal action
recognition. We independently train individual modality models using a variant
of focal loss tailored to handle the long-tailed distribution of the MECCANO
[21] dataset. Based on the underlying principle of focal loss, which captures
the relationship between tail (scarce) classes and their prediction
difficulties, we propose an exponentially decaying variant of focal loss for
our current task. It initially emphasizes learning from the hard misclassified
examples and gradually adapts to the entire range of examples in the dataset.
This annealing process encourages the model to strike a balance between
focusing on the sparse set of hard samples, while still leveraging the
information provided by the easier ones. Additionally, we opt for the late
fusion strategy to combine the resultant probability distributions from RGB and
Depth modalities for final action prediction. Experimental evaluations on the
MECCANO dataset demonstrate the effectiveness of our approach.
Related papers
- MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition [4.192370959537781]
We present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data.
We evaluate our approach on multiple datasets, including MNIST, its augmented variations, CIFAR-10, SVHN, and GalaxyMNIST.
arXiv Detail & Related papers (2024-10-02T22:36:12Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Variational Density Propagation Continual Learning [0.0]
Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data.
This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets.
arXiv Detail & Related papers (2023-08-22T21:51:39Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.