Related papers: Utilizing Large Language Models for Event Deconstruction to Enhance Multimodal Aspect-Based Sentiment Analysis

Related papers

Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning [12.054910727620154]
Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature.<n>This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals.
arXiv Detail & Related papers (2025-07-24T09:49:53Z)
Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis [28.52057785196361]
In Natural Language Processing (NLP), numerous studies have demonstrated the effectiveness of In-Context Learning (ICL)<n>Inspired by the success of Large Language Models (LLMs), researchers have developed Large Multimodal Models (LMMs) with ICL capabilities.<n>This paper conducts a comprehensive external and internal investigation of multimodal in-context learning on the image captioning task.
arXiv Detail & Related papers (2025-07-08T08:07:57Z)
REMoH: A Reflective Evolution of Multi-objective Heuristics approach via Large Language Models [39.85828629779943]
Multi-objective optimization is fundamental in complex decision-making tasks.<n>Recent advances in Large Language Models (LLMs) offer enhanced explainability, adaptability, and reasoning.<n>This work proposes Reflective Evolution of Multi-objective Heuristics (REMoH), a novel framework integrating NSGA-II with LLM-based generation.
arXiv Detail & Related papers (2025-06-09T13:38:28Z)
Multi-domain Multilingual Sentiment Analysis in Industry: Predicting Aspect-based Opinion Quadruples [1.5039745292757671]
This paper explores the design of an aspect-based sentiment analysis system using large language models (LLMs) for real-world use.<n>We focus on quadruple opinion extraction -- identifying aspect categories, sentiment polarity, targets, and opinion expressions from text data across different domains and languages.
arXiv Detail & Related papers (2025-05-15T15:11:48Z)
Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models [10.623996218106564]
We introduce a novel parameterization methodology that facilitates the mapping of specific experts into a shared latent space. All expert operations are systematically decomposed into two principal components: a shared projection into a lower-dimensional latent space, followed by expert-specific transformations. This factorized approach substantially diminishes parameter count and computational requirements.
arXiv Detail & Related papers (2025-03-29T14:35:34Z)
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning [39.54891426369773]
Trade-offs between model size, architecture, and performance remain underexplored. In this paper, we introduce LLaVA-MORE, a new family of MLLMs that integrates recent language models with diverse visual backbones. To ensure fair comparisons, we employ a unified training protocol applied consistently across all architectures.
arXiv Detail & Related papers (2025-03-19T18:10:12Z)
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models [74.48084001058672]
The rise of foundation models has transformed machine learning research. multimodal foundation models (MMFMs) pose unique interpretability challenges beyond unimodal frameworks. This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems.
arXiv Detail & Related papers (2025-02-22T20:55:26Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs) We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs. We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions [0.0]
Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text and images. Traditional supervised learning methods have shown effectiveness in this task, but the adaptability of large language models (LLMs) to MABSA remains uncertain. Recent advances in LLMs, such as Llama2, LLaVA, and ChatGPT, demonstrate strong capabilities in general tasks, yet their performance in complex and fine-grained scenarios like MABSA is underexplored.
arXiv Detail & Related papers (2024-11-23T02:17:10Z)
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions [0.3495246564946556]
We present an initial analysis of layer-wise representations in multi-scale architectures, with a focus on Canonical Correlation Analysis (CCA) and Mutual Information (MI) We find that the improved performance on SUPERB tasks is primarily due to the auxiliary low-resolution loss rather than the downsampling itself. These findings challenge assumptions about the multi-scale nature of MR-HuBERT and motivate the importance of disentangling computational efficiency from learning better representations.
arXiv Detail & Related papers (2024-10-31T14:09:05Z)
Towards Robust Multimodal Sentiment Analysis with Incomplete Data [20.75292807497547]
We present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust Multimodal Sentiment Analysis (MSA) LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios.
arXiv Detail & Related papers (2024-09-30T07:14:31Z)
A Survey on Multimodal Benchmarks: In the Era of Large AI Models [13.299775710527962]
Multimodal Large Language Models (MLLMs) have brought substantial advancements in artificial intelligence. This survey systematically reviews 211 benchmarks that assess MLLMs across four core domains: understanding, reasoning, generation, and application.
arXiv Detail & Related papers (2024-09-21T15:22:26Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation. We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z)
A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis. PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information. EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z)
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models [56.256069117502385]
Chain of Thought (CoT) approaches can be used to enhance the capability of Large Language Models (LLMs) on complex reasoning tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored. We introduce a novel approach that addresses this challenge by using retrieval mechanisms to automatically select demonstration examples.
arXiv Detail & Related papers (2023-12-04T08:07:21Z)
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework [0.0]
This paper explores the steps involved in multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive.
arXiv Detail & Related papers (2023-09-05T17:56:22Z)
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.