Related papers: LLM-Driven Multimodal Opinion Expression Identification

LLM-Driven Multimodal Opinion Expression Identification

URL: http://arxiv.org/abs/2406.18088v2
Date: Sat, 29 Jun 2024 09:55:50 GMT
Title: LLM-Driven Multimodal Opinion Expression Identification
Authors: Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang,
Abstract summary: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI task, integrating text and speech to mirror real-world scenarios.
Score: 31.960884435487078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results.

Related papers

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis [5.283520143851873]
We present a novel methodology that integrates a data-optimized framework with an advanced acoustic model to build high-quality TTS systems. We demonstrate the effectiveness of our approach using Thai as an illustrative case, where intricate phonetic rules and sparse resources are effectively addressed.
arXiv Detail & Related papers (2025-04-10T15:32:57Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
An End-to-End Speech Summarization Using Large Language Model [7.562198375754054]
Speech Summarization (SSum) aims to generate human-like text summaries from spoken content. Research on large language models (LLMs) and multimodal information fusion has provided new insights. We propose an end-to-end SSum model that utilizes Q-Former as a connector for the audio-text modality.
arXiv Detail & Related papers (2024-07-02T07:22:57Z)
CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts [9.682499180341273]
Large language models (LLMs) have greatly enhanced text generation across industries. Their human-like outputs make distinguishing between human and AI authorship challenging. Current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors.
arXiv Detail & Related papers (2024-06-13T12:43:40Z)
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets [22.29915616018026]
Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks. Our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules. We introduce a three-stage training approach, expressly developed to enhance the model's ability to align auditory and textual information.
arXiv Detail & Related papers (2024-05-03T14:35:58Z)
Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation. We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z)
LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task. Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z)
Rethinking and Improving Multi-task Learning for End-to-end Speech Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z)
Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study [0.0]
This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems. Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems.
arXiv Detail & Related papers (2023-07-13T02:31:55Z)
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark. Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs. We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning. We propose a simple but effective in-context learning framework called ICL-D3IE. Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.