Related papers: Automatic Fused Multimodal Deep Learning for Plant Identification

Related papers

Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation [22.722731231389393]
Current approaches primarily leverage the knowledge and reasoning capabilities of parameter-heavy (Multimodal) Large Language Models (LLMs)<n>We propose a Multimodal Chain-of-Student Reasoning Distillation model, MulCoT-RD, to address deployment constraints in resource-limited environments.<n>Experiments on four datasets demonstrate that MulCoT-RD with only 3B parameters achieves strong performance on JMSRC, while exhibiting robust generalization and enhanced interpretability.
arXiv Detail & Related papers (2025-08-07T10:23:14Z)
G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation [0.7673339435080445]
We introduce Gradient-Guided Distillation (G$2$D), a knowledge distillation framework that optimize the multimodal model with a custom-built loss function.<n>We show that G$2$D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks.
arXiv Detail & Related papers (2025-06-26T17:37:36Z)
MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection [3.5148549831413036]
Accurate identification of agricultural pests is essential for crop protection.<n>While deep learning has advanced pest detection, most existing approaches rely solely on low-level visual features.
arXiv Detail & Related papers (2025-05-05T08:10:22Z)
Continual Multimodal Contrastive Learning [70.60542106731813]
Multimodal contrastive learning (MCL) advances in aligning different modalities and generating multimodal representations in a joint space. However, a critical yet often overlooked challenge remains: multimodal data is rarely collected in a single process, and training from scratch is computationally expensive. In this paper, we formulate CMCL through two specialized principles of stability and plasticity. We theoretically derive a novel optimization-based method, which projects updated gradients from dual sides onto subspaces where any gradient is prevented from interfering with the previously learned knowledge.
arXiv Detail & Related papers (2025-03-19T07:57:08Z)
What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning Methods [0.13194391758295113]
We present a method that measures the importance of each modality in a dataset for the model to fulfill its task. We found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up. With our method we make a crucial contribution to the field of interpretability in deep learning based multimodal research.
arXiv Detail & Related papers (2025-02-28T12:39:39Z)
Multiple Linked Tensor Factorization [0.0]
In biomedical research, it is common to generate high content data that are both multi-source and multi-way. Despite growing interest in multi-source and multi-way factorization, methods that can handle data that are both multi-source and multi-way are limited. We propose a Multiple Linkeds Factorization (MULTIFAC) method to simultaneously reduce the dimension of multiple multi-way arrays and approximate underlying signal.
arXiv Detail & Related papers (2025-02-27T17:12:57Z)
MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach. MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student. We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
Enhancing Plant Disease Detection: A Novel CNN-Based Approach with Tensor Subspace Learning and HOWSVD-MD [3.285994579445155]
This paper introduces a cutting-edge technique for the detection and classification of tomato leaf diseases. We propose a sophisticated approach within the domain of subspace learning, known as Higher-Order Whitened Singular Value Decomposition. The efficacy of this innovative method was rigorously tested through comprehensive experiments on two distinct datasets.
arXiv Detail & Related papers (2024-05-30T13:46:56Z)
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models [48.87160158792048]
We introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods.
arXiv Detail & Related papers (2024-05-26T10:58:22Z)
U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics. We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z)
Application of Multimodal Fusion Deep Learning Model in Disease Recognition [14.655086303102575]
This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. During the feature extraction stage, cutting-edge deep learning models are applied to distill advanced features from image-based, temporal, and structured data sources. The findings demonstrate significant advantages of the multimodal fusion model across multiple evaluation metrics.
arXiv Detail & Related papers (2024-05-22T23:09:49Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Generating Diverse Agricultural Data for Vision-Based Farming Applications [74.79409721178489]
This model is capable of simulating distinct growth stages of plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture.
arXiv Detail & Related papers (2024-03-27T08:42:47Z)
BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions [30.27773980916216]
Agricultural production is facing severe challenges in the next decades induced by climate change and the need for sustainability. Advancements in field management through non-chemical weeding by robots in combination with monitoring of crops by autonomous unmanned aerial vehicles (UAVs) are helpful to address these challenges. The analysis of plant traits, called phenotyping, is an essential activity in plant breeding, it however involves a great amount of manual labor.
arXiv Detail & Related papers (2023-12-22T14:06:44Z)
Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
Distilled Mid-Fusion Transformer Networks for Multi-Modal Human Activity Recognition [34.424960016807795]
Multi-modal Human Activity Recognition could utilize the complementary information to build models that can generalize well. Deep learning methods have shown promising results, their potential in extracting salient multi-modal spatial-temporal features has not been fully explored. A knowledge distillation-based Multi-modal Mid-Fusion approach, DMFT, is proposed to conduct informative feature extraction and fusion to resolve the Multi-modal Human Activity Recognition task efficiently.
arXiv Detail & Related papers (2023-05-05T19:26:06Z)
Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z)
Sparse Fusion for Multimodal Transformers [7.98117428941095]
We present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers. Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling. State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements.
arXiv Detail & Related papers (2021-11-23T16:43:49Z)
A Deep Learning Generative Model Approach for Image Synthesis of Plant Leaves [62.997667081978825]
We generate via advanced Deep Learning (DL) techniques artificial leaf images in an automatized way. We aim to dispose of a source of training samples for AI applications for modern crop management.
arXiv Detail & Related papers (2021-11-05T10:53:35Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
SWP-LeafNET: A novel multistage approach for plant leaf identification based on deep CNN [1.9981375888949475]
Leaf classification is a computer-vision task performed for the automated identification of plant species. Researchers have recently become more inclined toward deep learning-based methods. In this paper, a botanist's behavior is modeled in leaf identification by proposing a highly-efficient method of maximum behavioral resemblance.
arXiv Detail & Related papers (2020-09-10T20:28:57Z)
Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.