TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
- URL: http://arxiv.org/abs/2303.09314v2
- Date: Mon, 24 Apr 2023 09:23:25 GMT
- Title: TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
- Authors: Linhao Zhang, Li Jin, Xian Sun, Guangluan Xu, Zequn Zhang, Xiaoyu Li,
Nayu Liu, Qing Liu, Shiyao Yan
- Abstract summary: We propose TOT: a topology-aware optimal transport framework to decipher the implicit harm in memes scenario.
Specifically, we leverage an optimal transport kernel method to capture complementary information from multiple modalities.
The newly achieved state-of-the-art performance on two publicly available benchmark datasets, together with further visual analysis, demonstrate the superiority of TOT.
- Score: 18.015012133043093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal hate detection, which aims to identify harmful content online such
as memes, is crucial for building a wholesome internet environment. Previous
work has made enlightening exploration in detecting explicit hate remarks.
However, most of their approaches neglect the analysis of implicit harm, which
is particularly challenging as explicit text markers and demographic visual
cues are often twisted or missing. The leveraged cross-modal attention
mechanisms also suffer from the distributional modality gap and lack logical
interpretability. To address these semantic gaps issues, we propose TOT: a
topology-aware optimal transport framework to decipher the implicit harm in
memes scenario, which formulates the cross-modal aligning problem as solutions
for optimal transportation plans. Specifically, we leverage an optimal
transport kernel method to capture complementary information from multiple
modalities. The kernel embedding provides a non-linear transformation ability
to reproduce a kernel Hilbert space (RKHS), which reflects significance for
eliminating the distributional modality gap. Moreover, we perceive the topology
information based on aligned representations to conduct bipartite graph path
reasoning. The newly achieved state-of-the-art performance on two publicly
available benchmark datasets, together with further visual analysis,
demonstrate the superiority of TOT in capturing implicit cross-modal alignment.
Related papers
- Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems [6.084414764415137]
We propose an adaptive Graph Attention Sampling with the Edges Fusion framework to solve vehicle routing problems.
Our proposed model outperforms the existing methods by 2.08%-6.23% and shows stronger generalization ability.
arXiv Detail & Related papers (2024-05-21T03:33:07Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Unified Domain Adaptive Semantic Segmentation [96.74199626935294]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.
We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.
Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - Cross-Modal Translation and Alignment for Survival Analysis [7.657906359372181]
We present a framework to explore the intrinsic cross-modal correlations and transfer potential complementary information.
Our experiments on five public TCGA datasets demonstrate that our proposed framework outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T13:29:14Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Multimodal Trajectory Prediction via Topological Invariance for
Navigation at Uncontrolled Intersections [45.508973373913946]
We focus on decentralized navigation among multiple non-communicating rational agents at street intersections without traffic signs or signals.
Our key insight is that the geometric structure of the intersection and the incentive of agents to move efficiently and avoid collisions (rationality) reduces the space of likely behaviors.
We design Multiple Topologies Prediction (MTP), a data-driven trajectory-prediction mechanism that reconstructs trajectory representations of high-likelihood modes in multiagent intersection scenes.
arXiv Detail & Related papers (2020-11-08T02:56:42Z) - Representation Learning via Adversarially-Contrastive Optimal Transport [40.52344027750609]
We set the problem within the context of contrastive representation learning.
We propose a framework connecting Wasserstein GANs with a novel classifier.
Our results demonstrate competitive performance against challenging baselines.
arXiv Detail & Related papers (2020-07-11T19:46:18Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.