M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal
Aspect-based Sentiment Analysis
- URL: http://arxiv.org/abs/2310.14605v1
- Date: Mon, 23 Oct 2023 06:22:39 GMT
- Title: M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal
Aspect-based Sentiment Analysis
- Authors: Fei Zhao, Chunhui Li, Zhen Wu, Yawen Ouyang, Jianbing Zhang, Xinyu Dai
- Abstract summary: Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained Sentiment Analysis task.
We propose a Multi-grained Multi-curriculum Denoising Framework (M2DF) which can achieve denoising by adjusting the order of training data.
Our framework consistently outperforms state-of-the-art work on three sub-tasks of MABSA.
- Score: 32.9772577419091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained
Sentiment Analysis task, which has attracted growing research interests
recently. Existing work mainly utilizes image information to improve the
performance of MABSA task. However, most of the studies overestimate the
importance of images since there are many noise images unrelated to the text in
the dataset, which will have a negative impact on model learning. Although some
work attempts to filter low-quality noise images by setting thresholds, relying
on thresholds will inevitably filter out a lot of useful image information.
Therefore, in this work, we focus on whether the negative impact of noisy
images can be reduced without modifying the data. To achieve this goal, we
borrow the idea of Curriculum Learning and propose a Multi-grained
Multi-curriculum Denoising Framework (M2DF), which can achieve denoising by
adjusting the order of training data. Extensive experimental results show that
our framework consistently outperforms state-of-the-art work on three sub-tasks
of MABSA.
Related papers
- UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data.
We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images.
We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z) - Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning [53.766434746801366]
Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet.
Hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information.
Recent works propose generating unlearnable examples by adding imperceptible perturbations to training images to build shortcuts for protection.
We propose Multi-step Error Minimization (MEM), a novel optimization process for generating multimodal unlearnable examples.
arXiv Detail & Related papers (2024-07-23T09:00:52Z) - denoiSplit: a method for joint microscopy image splitting and unsupervised denoising [7.362569187959687]
denoiSplit is a method to tackle the challenge of joint semantic image splitting and unsupervised denoising.
Image splitting involves dissecting an image into its distinguishable semantic structures.
We show that the current state-of-the-art method for this task struggles in the presence of image noise.
arXiv Detail & Related papers (2024-03-18T15:03:56Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation [36.43428388918294]
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning.
Standard data filtering approaches fail to remove mismatched text-image pairs.
We propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness.
arXiv Detail & Related papers (2024-03-02T20:36:10Z) - Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for
English to Indian Languages [29.416563233407892]
The study investigates the effectiveness of utilizing multimodal information in Neural Machine Translation (NMT)
Surprisingly, the study finds that images might be redundant in this context.
Experiments translate from English to Hindi, Bengali, and Malayalam, outperforming state-of-the-art benchmarks significantly.
arXiv Detail & Related papers (2023-08-30T14:52:14Z) - Generalizable Denoising of Microscopy Images using Generative
Adversarial Networks and Contrastive Learning [0.0]
We propose a novel framework for few-shot microscopy image denoising.
Our approach combines a generative adversarial network (GAN) trained via contrastive learning (CL) with two structure preserving loss terms.
We demonstrate the effectiveness of our method on three well-known microscopy imaging datasets.
arXiv Detail & Related papers (2023-03-27T13:55:07Z) - Masked Image Training for Generalizable Deep Image Denoising [53.03126421917465]
We present a novel approach to enhance the generalization performance of denoising networks.
Our method involves masking random pixels of the input image and reconstructing the missing information during training.
Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios.
arXiv Detail & Related papers (2023-03-23T09:33:44Z) - Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network.
It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space.
By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z) - Deformed2Self: Self-Supervised Denoising for Dynamic Medical Imaging [0.0]
We propose Deformed2Self, an end-to-end self-supervised deep learning framework for dynamic imaging denoising.
It combines single-image and multi-image denoising to improve image quality and use a spatial transformer network to model motion between different slices.
arXiv Detail & Related papers (2021-06-23T05:50:19Z) - MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
Grounding [74.33171794972688]
We present algorithms to model phrase-object relevance by leveraging fine-grained visual representations and visually-aware language representations.
Experiments conducted on the widely-adopted Flickr30k dataset show a significant improvement over existing weakly-supervised methods.
arXiv Detail & Related papers (2020-10-12T00:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.