Related papers: Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings

Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings

URL: http://arxiv.org/abs/2412.03681v3
Date: Wed, 11 Dec 2024 20:08:44 GMT
Title: Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings
Authors: Guy Barel, Oren Tsur, Dan Vilenchik,
Abstract summary: Stance detection plays a pivotal role in enabling an extensive range of downstream applications, from discourse parsing to tracing the spread of fake news and the denial of scientific facts.<n>We introduce TASTE -- a multimodal architecture for stance detection that harmoniously fuses Transformer-based content embedding with unsupervised structural embedding.<n> TASTE achieves state-of-the-art results on common benchmarks, significantly outperforming an array of strong baselines.
Score: 5.229806149125529
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Stance detection plays a pivotal role in enabling an extensive range of downstream applications, from discourse parsing to tracing the spread of fake news and the denial of scientific facts. While most stance classification models rely on textual representation of the utterance in question, prior work has demonstrated the importance of the conversational context in stance detection. In this work we introduce TASTE -- a multimodal architecture for stance detection that harmoniously fuses Transformer-based content embedding with unsupervised structural embedding. Through the fine-tuning of a pretrained transformer and the amalgamation with social embedding via a Gated Residual Network (GRN) layer, our model adeptly captures the complex interplay between content and conversational structure in determining stance. TASTE achieves state-of-the-art results on common benchmarks, significantly outperforming an array of strong baselines. Comparative evaluations underscore the benefits of social grounding -- emphasizing the criticality of concurrently harnessing both content and structure for enhanced stance detection.

Related papers

Multimodal Fact Checking with Unified Visual, Textual, and Contextual Representations [2.139909491081949]
We propose a unified framework for fine-grained multimodal fact verification called "MultiCheck"<n>Our architecture combines dedicated encoders for text and images with a fusion module that captures cross-modal relationships using element-wise interactions.<n>We evaluate our approach on the Factify 2 dataset, achieving a weighted F1 score of 0.84, substantially outperforming the baseline.
arXiv Detail & Related papers (2025-08-07T07:36:53Z)
Boosting Neural Language Inference via Cascaded Interactive Reasoning [38.125341836302525]
Natural Language Inference (NLI) focuses on ascertaining the logical relationship between a given premise and hypothesis.<n>This task presents significant challenges due to inherent linguistic features such as diverse phrasing, semantic complexity, and contextual nuances.<n>We introduce the Cascaded Interactive Reasoning Network (CIRN), a novel architecture designed for deeper semantic comprehension in NLI.
arXiv Detail & Related papers (2025-05-10T11:37:15Z)
CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning [18.75039816544345]
We present a novel collaborative stance detection framework called (CoSD) CoSD learns topic-aware semantics and collaborative signals among texts, topics, and stance labels. Experiments on two benchmark datasets demonstrate the state-of-the-art detection performance of CoSD.
arXiv Detail & Related papers (2024-04-26T02:04:05Z)
Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training. Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models. This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z)
Augmenting Transformers with Recursively Composed Multi-grained Representations [42.87750629061462]
ReCAT is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions.
arXiv Detail & Related papers (2023-09-28T10:24:39Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection [82.94413676131545]
We propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection. KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space. It extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy.
arXiv Detail & Related papers (2023-06-28T06:08:20Z)
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z)
CATrans: Context and Affinity Transformer for Few-Shot Segmentation [36.802347383825705]
Few-shot segmentation (FSS) aims to segment novel categories given scarce annotated support images. In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer. We conduct experiments to demonstrate the effectiveness of the proposed model, outperforming the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-27T10:20:47Z)
HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction [24.853265244512954]
We propose a hierarchical contrastive learning Framework for DistantlySupervised relation extraction (HiCLRE) to reduce noisy sentences. Specifically, we propose a three-level hierarchical learning framework to interact with cross levels, generating the de-noising context-aware representations. Experiments demonstrate that HiCLRE significantly outperforms strong baselines in various mainstream DSRE datasets.
arXiv Detail & Related papers (2022-02-27T12:48:26Z)
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition [63.07844685982738]
This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states. We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly. The proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
arXiv Detail & Related papers (2022-01-17T09:46:59Z)
Predicting Above-Sentence Discourse Structure using Distant Supervision from Topic Segmentation [8.688675709130289]
RST-style discourse parsing plays a vital role in many NLP tasks. Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets.
arXiv Detail & Related papers (2021-12-12T10:16:45Z)
Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations. We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.