Related papers: Alignment Helps Make the Most of Multimodal Data

Related papers

KOKKAI DOC: An LLM-driven framework for scaling parliamentary representatives [0.0]
This paper introduces an LLM-driven framework designed to accurately scale the political issue stances of parliamentary representatives.<n>By leveraging advanced natural language processing techniques and large language models, the proposed methodology refines and enhances previous approaches.<n>The framework incorporates three major innovations: (1) de-noising parliamentary speeches via summarization to produce cleaner, more consistent opinion embeddings; (2) automatic extraction of axes of political controversy from legislators' speech summaries; and (3) a diachronic analysis that tracks the evolution of party positions over time.
arXiv Detail & Related papers (2025-05-11T21:03:53Z)
GridMind: A Multi-Agent NLP Framework for Unified, Cross-Modal NFL Data Insights [0.0]
This paper introduces GridMind, a framework that unifies structured, semi-structured, and unstructured data through Retrieval-Augmented Generation (RAG) and large language models (LLMs) This approach aligns with the evolving field of multimodal representation learning, where unified models are increasingly essential for real-time, cross-modal interactions.
arXiv Detail & Related papers (2025-03-24T18:33:36Z)
Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training.<n>Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data.<n>However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z)
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data [71.352883755806]
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders embedding performance. Recent approaches have leveraged data synthesis to address this problem, yet the quality of synthetic data remains a critical bottleneck.
arXiv Detail & Related papers (2025-02-12T15:03:33Z)
Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks.<n>Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z)
An Information Criterion for Controlled Disentanglement of Multimodal Data [39.601584166020274]
Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities. Disentangled Self-Supervised Learning (DisentangledSSL) is a novel self-supervised approach for learning disentangled representations.
arXiv Detail & Related papers (2024-10-31T14:57:31Z)
Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z)
Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers [3.651047982634467]
We implement and compare two approaches to automatic scaling analysis of political-party manifestos. We find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.
arXiv Detail & Related papers (2023-10-19T08:34:48Z)
Multimodal Graph Learning for Generative Tasks [89.44810441463652]
Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize. We propose Multimodal Graph Learning (MMGL), a framework for capturing information from multiple multimodal neighbors with relational structures among them.
arXiv Detail & Related papers (2023-10-11T13:25:03Z)
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue [50.279206765971125]
We explore three methods to tackle the problem of interpreting multimodal inputs from conversational and situational contexts. Our best method, scene-dialogue alignment, improves the performance by 20% F1-score compared to the SIMMC 2.1 baselines.
arXiv Detail & Related papers (2023-02-28T15:45:20Z)
Examining Political Rhetoric with Epistemic Stance Detection [13.829628375546568]
We develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. We demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books.
arXiv Detail & Related papers (2022-12-29T23:47:14Z)
Inference of Media Bias and Content Quality Using Natural-Language Processing [6.092956184948962]
We present a framework to infer both political bias and content quality of media outlets from text. We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets. Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.
arXiv Detail & Related papers (2022-12-01T03:04:55Z)
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation [53.87485260058957]
We study video-grounded dialogue generation, where a response is generated based on the dialogue context and the associated video. The primary challenges of this task lie in (1) the difficulty of integrating video data into pre-trained language models (PLMs) We propose a multi-agent reinforcement learning method to collaboratively perform reasoning on different modalities.
arXiv Detail & Related papers (2022-10-22T14:45:29Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years. We comprehensively contextualize the advance of the recent multimodal image synthesis and editing. We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z)
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation [32.15124603618625]
We propose a new model based on multimodal fused graph convolutional network, MMGCN, in this work. MMGCN can not only make use of multimodal dependencies effectively, but also leverage speaker information to model inter-speaker and intra-speaker dependency. We evaluate our proposed model on two public benchmark datasets, IEMOCAP and MELD, and the results prove the effectiveness of MMGCN.
arXiv Detail & Related papers (2021-07-14T15:37:02Z)
Analyzing Online Political Advertisements [10.386018392170083]
We present the first computational study on online political ads with the aim to infer the political ideology of an ad sponsor. We develop two new large datasets for the two tasks consisting of ads from the U.S.
arXiv Detail & Related papers (2021-05-09T23:18:37Z)
Ranking the information content of distance measures [61.754016309475745]
We introduce a statistical test that can assess the relative information retained when using two different distance measures. This in turn allows finding the most informative distance measure out of a pool of candidates.
arXiv Detail & Related papers (2021-04-30T15:57:57Z)
Video Sentiment Analysis with Bimodal Information-augmented Multi-Head Attention [7.997124140597719]
This study focuses on the sentiment analysis of videos containing time series data of multiple modalities. The key problem is how to fuse these heterogeneous data. Based on bimodal interaction, more important bimodal features are assigned larger weights.
arXiv Detail & Related papers (2021-03-03T12:30:11Z)
M2P2: Multimodal Persuasion Prediction using Adaptive Fusion [65.04045695380333]
This paper solves two problems: the Debate Outcome Prediction (DOP) problem predicts who wins a debate and the Intensity of Persuasion Prediction (IPP) problem predicts the change in the number of votes before and after a speaker speaks. Our M2P2 framework is the first to use multimodal (acoustic, visual, language) data to solve the IPP problem.
arXiv Detail & Related papers (2020-06-03T18:47:24Z)
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis [103.69656907534456]
Recent multimodal learning with strong performances on human-centric tasks are often black-box. We propose Multimodal Routing, which adjusts weights between input modalities and output representations differently for each input sample.
arXiv Detail & Related papers (2020-04-29T13:42:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.